ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 学位論文
  2. 博士論文

マイクロブログの解析データを利用したWebベースのサファリレビューシス テム開発

https://doi.org/10.19000/0002000492
https://doi.org/10.19000/0002000492
4712fb76-d1b0-456d-b1da-c3dd29018766
名前 / ファイル ライセンス アクション
PhD_Thesis_SILAA PhD_Thesis_SILAA _Sept4.pdf (5.8 MB)
Item type 学位論文 / Thesis or Dissertation(1)
公開日 2023-10-03
タイトル
タイトル Web-based Safari Review System Development using Microblog Analyzed Data
言語 en
タイトル
タイトル マイクロブログの解析データを利用したWebベースのサファリレビューシス テム開発
言語 ja
言語
言語 eng
資源タイプ
資源 http://purl.org/coar/resource_type/c_db06
タイプ doctoral thesis
ID登録
ID登録 10.19000/0002000492
ID登録タイプ JaLC
アクセス権
アクセス権 open access
アクセス権URI http://purl.org/coar/access_right/c_abf2
著者 Victor Alex Silaa

× Victor Alex Silaa

en Victor Alex Silaa

Search repository
抄録
内容記述タイプ Abstract
内容記述 In this study, I propose the use of online microblogs as review supplements and demonstrate their
applicability through a designed tourist support system that aims to provide additional opinions
and up-to-date points of interest to the less-known tourist spots. In realizing this proposal, I use
Information Extraction (IE), Artificial Intelligence (AI), and Natural Language Processing (NLP)
- based techniques. The proposed approach folds into three.
First, through the use of geotagged tweets. Tweets that contain geolocation information
are considered geotagged and therefore treated as possible tourist on-spot opinions. The main
challenge, however, is to confirm the authenticity of the extracted tweets. This stage includes the
use of location clustering and classification techniques. Specifically, extracted geotagged tweets
are clustered by using location information and then annotated taking into consideration specific
features applied to machine learning-based classification techniques. As for the machine learning
(ML) algorithms, I adopt a fine-tuned transformer neural network-based BERT model which
implements the information of token context orientation for better classification.
Second, I studied geolocatability of ungeotagged tweets so that they can be used as review
alternatives. Ungeotagged tweets have no geolocation information attached so it is difficult to
associate with specific location. Furthermore, Twitter data is typically noisy and consists of
ungrammatical or informal phraseology and non-standard vocabulary, which additionally causes
the feature sparsity problem, resulting in low classifier performance.
To address this, I proposed the use of a two-stage process, a transformer-based model for the
classification of primary tweets, and a combination of impact words like location mention or event
mention for location inferring. Additionally, I evaluate a range of pre-processing techniques for text
categorization to accurately obtain a proper set that collectively contributes to the improvement
of prediction accuracy. A classification framework created here relies on a fine-tuned transformer
neural network model which learns from tweet contents and predicts the locations from which those
tweets were sent - with a limited application in the detection of widely known general locations
- such as tourist spots. I learned that the average 0.84 F1 score of a pre-trained DistilBERT
language model outperformed other tested models when tested on different pre-processing datasets.
Furthermore, i evaluated the effect of impact words like location mention, and event mention on
the geolocation estimation, and model accuracy improvement when impact words are involved or
removed. To investigate the effect of impact words on a classification model, i first computed
the weighting of words using TFIDF and futher created a likelihood wordlist. I discovered model
accuracy improvement as much as 6% when impact words are involved compared to when they
are removed which suggests positive influence of impact words in geolocatability. I also discovered
wrong weighted impact words that negatively contributes to the model performance and byeliminating them, the model F1 score improved by 3%.
Third, I demonstrate the applicability of these two approaches by designing a tourist support
system and mapping extracted opinions to their respective tourist spots as touristic information.
言語 en
書誌情報
p. 1, 発行日 2023-09
著者版フラグ
言語 en
値 ETD
学位名
言語 ja
学位名 博士(工学)
学位授与機関
学位授与機関識別子Scheme kakenhi
学位授与機関識別子 10106
言語 ja
学位授与機関名 北見工業大学
学位授与番号
学位授与番号 甲第211号
研究科・専攻名
研究科・専攻名 生産基盤工学専攻
学位授与年月日
学位授与年月日 2023-09-05
戻る
0
views
See details
Views

Versions

Ver.1 2023-10-03 01:12:25.764961
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3