Show simple item record

dc.contributor.authorJain, Shubham
dc.contributor.authorde Buitléir, Amy
dc.contributor.authorFallon, Enda
dc.date.accessioned2021-04-06T09:51:13Z
dc.date.available2021-04-06T09:51:13Z
dc.date.copyright2021
dc.date.issued2021-03-10
dc.identifier.citationJain, S., de Buitléir, A., Fallon, E. (2021). An extensible parsing pipeline for unstructured data processing. In 2021 23rd International Conference on Advanced Communication Technology (ICACT). 7-10 February. DOI: 10.23919/ICACT51234.2021.9370654en_US
dc.identifier.urihttp://research.thea.ie/handle/20.500.12065/3555
dc.description.abstractNetwork monitoring and diagnostics systems depict the running system's state and generate enormous amounts of unstructured data through log files, print statements, and other reports. It is not feasible to manually analyze all these files due to limited resources and the need to develop custom parsers to convert unstructured data into desirable file formats. Prior research focuses on rule-based and relationship-based parsing methods to parse unstructured data into structured file formats; these methods are labor-intensive and need large annotated datasets. This paper presents an unsupervised text processing pipeline that analyses such text files, removes extraneous information, identifies tabular components, and parses them into a structured file format. The proposed approach is resilient to changes in the data structure, does not require training data, and is domain-independent. We experiment and compare topic modeling and clustering approaches to verify the accuracy of the proposed technique. Our findings indicate that combining similarity and clustering algorithms to identify data components had better accuracy than topic modeling.en_US
dc.formatPDFen_US
dc.publisherIEEEen_US
dc.relation.ispartof2021 23rd International Conference on Advanced Communication Technology (ICACT)en_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectUnsupervised data miningen_US
dc.subjectInformation extractionen_US
dc.subjectClusteringen_US
dc.subjectTopic modelingen_US
dc.titleAn extensible parsing pipeline for unstructured data processingen_US
dc.typeinfo:eu-repo/semantics/conferenceObjecten_US
dc.conference.date2021-02
dc.conference.hostICACTen_US
dc.conference.locationPyeongChang, Korea (South)en_US
dc.contributor.affiliationAthlone Institute of Technologyen_US
dc.contributor.sponsorIrish Research Council Enterprise Partnership Scheme Postgraduate Scholarship 2020en_US
dc.description.peerreviewyesen_US
dc.identifier.doi10.23919/ICACT51234.2021.9370654en_US
dc.identifier.orcidhttps://orcid.org/0000-0002-0913-3948en_US
dc.identifier.orcidhttps://orcid.org/0000-0001-8359-0920en_US
dc.identifier.orcidhttps://orcid.org/0000-0002-8300-5813en_US
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessen_US
dc.subject.departmentSoftware Research Institute AITen_US
dc.type.versioninfo:eu-repo/semantics/acceptedVersionen_US
dc.relation.projectidProject EPSPG/2020/7en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International