ログイン
言語:

WEKO3

  • トップ
  • コミュニティ
  • ランキング
AND
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

{"_buckets": {"deposit": "49b6ecc8-d571-4363-9ab8-b02cc7694ff5"}, "_deposit": {"id": "8938", "owners": [], "pid": {"revision_id": 0, "type": "depid", "value": "8938"}, "status": "published"}, "_oai": {"id": "oai:kitami-it.repo.nii.ac.jp:00008938", "sets": ["1:87"]}, "author_link": ["90352", "304", "90480", "90424", "90367"], "item_1646810750418": {"attribute_name": "\u51fa\u7248\u30bf\u30a4\u30d7", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_3_biblio_info_186": {"attribute_name": "\u66f8\u8a8c\u60c5\u5831", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2019", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "10", "bibliographicPageStart": "317", "bibliographicVolumeNumber": "10", "bibliographic_titles": [{"bibliographic_title": "Information", "bibliographic_titleLang": "en"}]}]}, "item_3_description_184": {"attribute_name": "\u6284\u9332", "attribute_value_mlt": [{"subitem_description": "Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter\u2014a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experimental results we obtained demonstrate the high performance of our algorithm, comparable with the other best-performing models. Given its low computational cost and competitive results, we believe that the proposed approach could be extended to other languages, and possibly also to other Natural Language Processing tasks, such as speech recognition.", "subitem_description_type": "Abstract"}]}, "item_3_full_name_183": {"attribute_name": "\u8457\u8005\u5225\u540d", "attribute_value_mlt": [{"nameIdentifiers": [{"nameIdentifier": "90352", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "9000006924461", "nameIdentifierScheme": "CiNii ID", "nameIdentifierURI": "http://ci.nii.ac.jp/nrid/9000006924461"}, {"nameIdentifier": "60711504", "nameIdentifierScheme": "KAKEN - \u7814\u7a76\u8005\u691c\u7d22", "nameIdentifierURI": "https://nrid.nii.ac.jp/ja/nrid/1000060711504/"}], "names": [{"name": "\u30d7\u30bf\u30b7\u30f3\u30b9\u30ad, \u30df\u30cf\u30a6", "nameLang": "ja"}]}, {"nameIdentifiers": [{"nameIdentifier": "304", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "80324549", "nameIdentifierScheme": "KAKEN - \u7814\u7a76\u8005\u691c\u7d22", "nameIdentifierURI": "https://nrid.nii.ac.jp/ja/nrid/1000080324549/"}], "names": [{"name": "\u685d\u4e95, \u6587\u4eba", "nameLang": "ja"}]}]}, "item_3_publisher_212": {"attribute_name": "\u51fa\u7248\u8005", "attribute_value_mlt": [{"subitem_publisher": "MDPI"}]}, "item_3_relation_191": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type_id": {"subitem_relation_type_id_text": "https://doi.org/10.3390/info10100317", "subitem_relation_type_select": "DOI"}}]}, "item_3_select_195": {"attribute_name": "\u8457\u8005\u7248\u30d5\u30e9\u30b0", "attribute_value_mlt": [{"subitem_select_item": "publisher"}]}, "item_3_source_id_187": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "2078-2489", "subitem_source_identifier_type": "EISSN"}]}, "item_access_right": {"attribute_name": "\u30a2\u30af\u30bb\u30b9\u6a29", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "\u8457\u8005", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Nowakowski, Karol", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90480", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Ptaszynski, Michal", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90424", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Masui, Fumito", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "90367", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "\u30d5\u30a1\u30a4\u30eb\u60c5\u5831", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2020-11-02"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "Information 2019, 10(10), 317.pdf", "filesize": [{"value": "362.8 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_6", "mimetype": "application/pdf", "size": 362800.0, "url": {"label": "Information 2019, 10(10), 317", "url": "https://kitami-it.repo.nii.ac.jp/record/8938/files/Information 2019, 10(10), 317.pdf"}, "version_id": "e3c6027b-9aea-45a5-930b-2f67aecb5474"}]}, "item_keyword": {"attribute_name": "\u30ad\u30fc\u30ef\u30fc\u30c9", "attribute_value_mlt": [{"subitem_subject": "word segmentation", "subitem_subject_scheme": "Other"}, {"subitem_subject": "tokenization", "subitem_subject_scheme": "Other"}, {"subitem_subject": "language modelling", "subitem_subject_scheme": "Other"}, {"subitem_subject": "n-gram models", "subitem_subject_scheme": "Other"}, {"subitem_subject": "Ainu language", "subitem_subject_scheme": "Other"}, {"subitem_subject": "endangered languages", "subitem_subject_scheme": "Other"}, {"subitem_subject": "under-resourced languages", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "\u8a00\u8a9e", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "\u8cc7\u6e90\u30bf\u30a4\u30d7", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "MiNgMatch?A Fast N-gram Model for Word Segmentation of the Ainu Language", "item_titles": {"attribute_name": "\u30bf\u30a4\u30c8\u30eb", "attribute_value_mlt": [{"subitem_title": "MiNgMatch?A Fast N-gram Model for Word Segmentation of the Ainu Language", "subitem_title_language": "en"}]}, "item_type_id": "3", "owner": "1", "path": ["1/87"], "permalink_uri": "https://kitami-it.repo.nii.ac.jp/records/8938", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2020-11-02"}, "publish_date": "2020-11-02", "publish_status": "0", "recid": "8938", "relation": {}, "relation_version_is_last": true, "title": ["MiNgMatch?A Fast N-gram Model for Word Segmentation of the Ainu Language"], "weko_shared_id": -1}
  1. 学術雑誌掲載済論文
  2. 洋雑誌

MiNgMatch?A Fast N-gram Model for Word Segmentation of the Ainu Language

https://kitami-it.repo.nii.ac.jp/records/8938
30519602-607e-4a7b-9a81-d2d32e91ecbc
名前 / ファイル ライセンス アクション
Information Information 2019, 10(10), 317 (362.8 kB)
license.icon
Item type 学術雑誌論文 / Journal Article(1)
公開日 2020-11-02
タイトル
言語 en
タイトル MiNgMatch?A Fast N-gram Model for Word Segmentation of the Ainu Language
言語
言語 eng
キーワード
主題Scheme Other
主題 word segmentation
キーワード
主題Scheme Other
主題 tokenization
キーワード
主題Scheme Other
主題 language modelling
キーワード
主題Scheme Other
主題 n-gram models
キーワード
主題Scheme Other
主題 Ainu language
キーワード
主題Scheme Other
主題 endangered languages
キーワード
主題Scheme Other
主題 under-resourced languages
資源タイプ
資源 http://purl.org/coar/resource_type/c_6501
タイプ journal article
アクセス権
アクセス権 open access
アクセス権URI http://purl.org/coar/access_right/c_abf2
著者 Nowakowski, Karol

× Nowakowski, Karol

WEKO 90480

en Nowakowski, Karol

Search repository
Ptaszynski, Michal

× Ptaszynski, Michal

WEKO 90424

en Ptaszynski, Michal

Search repository
Masui, Fumito

× Masui, Fumito

WEKO 90367

en Masui, Fumito

Search repository
著者別名
姓名
姓名 プタシンスキ, ミハウ
言語 ja
著者別名
姓名
姓名 桝井, 文人
言語 ja
抄録
内容記述タイプ Abstract
内容記述 Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter—a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experimental results we obtained demonstrate the high performance of our algorithm, comparable with the other best-performing models. Given its low computational cost and competitive results, we believe that the proposed approach could be extended to other languages, and possibly also to other Natural Language Processing tasks, such as speech recognition.
書誌情報 en : Information

巻 10, 号 10, p. 317, 発行日 2019
ISSN
収録物識別子タイプ EISSN
収録物識別子 2078-2489
DOI
関連識別子
識別子タイプ DOI
関連識別子 https://doi.org/10.3390/info10100317
出版者
出版者 MDPI
著者版フラグ
値 publisher
出版タイプ
出版タイプ VoR
出版タイプResource http://purl.org/coar/version/c_970fb48d4fbd8a85
戻る
0
views
See details
Views

Versions

Ver.1 2021-03-01 06:11:56.056232
Show All versions

Share

Mendeley CiteULike Twitter Facebook Print Addthis

Cite as

Export

OAI-PMH
  • OAI-PMH JPCOAR
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by CERN Data Centre & Invenio


Powered by CERN Data Centre & Invenio