Web-based Safari Review System Development using Microblog Analyzed Data

Victor Alex Silaa

インデックスツリー

RootNode

アイテム

マイクロブログの解析データを利用したWebベースのサファリレビューシステム開発

https://doi.org/10.19000/0002000492

名前 / ファイル	ライセンス	アクション
PhD_Thesis_SILAA _Sept4.pdf (5.8 MB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2023-10-03

タイトル

Web-based Safari Review System Development using Microblog Analyzed Data

言語

タイトル

マイクロブログの解析データを利用したWebベースのサファリレビューシステム開発

言語

eng

資源タイプ

資源

http://purl.org/coar/resource_type/c_db06

タイプ

doctoral thesis

ID登録

10.19000/0002000492

ID登録タイプ

JaLC

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

著者

Victor Alex Silaa

抄録

内容記述タイプ

Abstract

内容記述

In this study, I propose the use of online microblogs as review supplements and demonstrate their
applicability through a designed tourist support system that aims to provide additional opinions
and up-to-date points of interest to the less-known tourist spots. In realizing this proposal, I use
Information Extraction (IE), Artificial Intelligence (AI), and Natural Language Processing (NLP)
- based techniques. The proposed approach folds into three.
First, through the use of geotagged tweets. Tweets that contain geolocation information
are considered geotagged and therefore treated as possible tourist on-spot opinions. The main
challenge, however, is to confirm the authenticity of the extracted tweets. This stage includes the
use of location clustering and classification techniques. Specifically, extracted geotagged tweets
are clustered by using location information and then annotated taking into consideration specific
features applied to machine learning-based classification techniques. As for the machine learning
(ML) algorithms, I adopt a fine-tuned transformer neural network-based BERT model which
implements the information of token context orientation for better classification.
Second, I studied geolocatability of ungeotagged tweets so that they can be used as review
alternatives. Ungeotagged tweets have no geolocation information attached so it is difficult to
associate with specific location. Furthermore, Twitter data is typically noisy and consists of
ungrammatical or informal phraseology and non-standard vocabulary, which additionally causes
the feature sparsity problem, resulting in low classifier performance.
To address this, I proposed the use of a two-stage process, a transformer-based model for the
classification of primary tweets, and a combination of impact words like location mention or event
mention for location inferring. Additionally, I evaluate a range of pre-processing techniques for text
categorization to accurately obtain a proper set that collectively contributes to the improvement
of prediction accuracy. A classification framework created here relies on a fine-tuned transformer
neural network model which learns from tweet contents and predicts the locations from which those
tweets were sent - with a limited application in the detection of widely known general locations
- such as tourist spots. I learned that the average 0.84 F1 score of a pre-trained DistilBERT
language model outperformed other tested models when tested on different pre-processing datasets.
Furthermore, i evaluated the effect of impact words like location mention, and event mention on
the geolocation estimation, and model accuracy improvement when impact words are involved or
removed. To investigate the effect of impact words on a classification model, i first computed
the weighting of words using TFIDF and futher created a likelihood wordlist. I discovered model
accuracy improvement as much as 6% when impact words are involved compared to when they
are removed which suggests positive influence of impact words in geolocatability. I also discovered
wrong weighted impact words that negatively contributes to the model performance and byeliminating them, the model F1 score improved by 3%.
Third, I demonstrate the applicability of these two approaches by designing a tourist support
system and mapping extracted opinions to their respective tourist spots as touristic information.

言語

書誌情報

p. 1, 発行日 2023-09

著者版フラグ

言語

値

ETD

学位名

言語

学位名

博士（工学）

学位授与機関

学位授与機関識別子Scheme

kakenhi

学位授与機関識別子

10106

言語

学位授与機関名

北見工業大学

学位授与番号

甲第211号

研究科・専攻名

生産基盤工学専攻

学位授与年月日

2023-09-05

戻る

views

See details

	Views

Versions

Ver.1

2023-10-03 01:12:25.764961

Show All versions

Cite as

Victor Alex Silaa, 2023, マイクロブログの解析データを利用したWebベースのサファリレビューシステム開発: 1– p.

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

マイクロブログの解析データを利用したWebベースのサファリレビューシステム開発

× Victor Alex Silaa

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

マイクロブログの解析データを利用したWebベースのサファリレビューシス テム開発

× Victor Alex Silaa

Versions

Share

Cite as

エクスポート

マイクロブログの解析データを利用したWebベースのサファリレビューシステム開発