Results for 'Dataset'

987 found
Order:
  1. A dataset of blockage, vandalism, and harassment activities for the cause of climate change mitigation.Quan-Hoang Vuong, Minh-Hoang Nguyen & Viet-Phuong La - manuscript
    Environmental activism is crucial for raising public awareness and support toward addressing the climate crisis. However, using climate change mitigation as the cause for blockage, vandalism, and harassment activities might be counterproductive and risk causing negative repercussions and declining public support. The paper describes a dataset of metadata of 89 blockage, vandalism, and harassment events happening in recent years. The dataset comprises three main categories: 1) Events, 2) Activists, and 3) Consequences. For researchers interested in environmental activism, climate (...)
    Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  2.  14
    Reading datasets: Strategies for interpreting the politics of data signification.Lindsay Poirier - 2021 - Big Data and Society 8 (2).
    All datasets emerge from and are enmeshed in power-laden semiotic systems. While emerging data ethics curriculum is supporting data science students in identifying data biases and their consequences, critical attention to the cultural histories and vested interests animating data semantics is needed to elucidate the assumptions and political commitments on which data rest, along with the externalities they produce. In this article, I introduce three modes of reading that can be engaged when studying datasets—a denotative reading, a connotative reading, and (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   2 citations  
  3.  31
    Cross-Dataset Variability Problem in EEG Decoding With Deep Learning.Lichao Xu, Minpeng Xu, Yufeng Ke, Xingwei An, Shuang Liu & Dong Ming - 2020 - Frontiers in Human Neuroscience 14.
  4. AGGA: A Dataset of Academic Guidelines for Generative AIs.Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson & Amit Dhurandhar - 2024 - Harvard Dataverse 4.
    AGGA (Academic Guidelines for Generative AIs) is a dataset of 80 academic guidelines for the usage of generative AIs and large language models in academia, selected systematically and collected from official university websites across six continents. Comprising 181,225 words, the dataset supports natural language processing tasks such as language modeling, sentiment and semantic analysis, model synthesis, classification, and topic labeling. It can also serve as a benchmark for ambiguity detection and requirements categorization. This resource aims to facilitate research (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark  
  5. SeCoDa: Sense Complexity Dataset.David Strohmaier, Sian Gooding, Shiva Taslimipoor & Ekaterina Kochmar - 2020 - Proceedings of the 12Th Language Resources and Evaluation Conference.
    The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  6.  29
    Cosmic Bayes. Datasets and priors in the hunt for dark energy.Michela Massimi - 2021 - European Journal for Philosophy of Science 11 (1):1-21.
    Bayesian methods are ubiquitous in contemporary observational cosmology. They enter into three main tasks: cross-checking datasets for consistency; fixing constraints on cosmological parameters; and model selection. This article explores some epistemic limits of using Bayesian methods. The first limit concerns the degree of informativeness of the Bayesian priors and an ensuing methodological tension between task and task. The second limit concerns the choice of wide flat priors and related tension between parameter estimation and model selection. The Dark Energy Survey and (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  7.  15
    IGGA: A Dataset of Industrial Guidelines and Policy Statements for Generative AIs.Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson & Amit Dhurandhar - 2024 - Harvard Dataverse 2.
    IGGA (Industrial Guidelines/policy statements for Generative AIs) is a comprehensive dataset comprising 160 guidelines and policy statements pertaining to the use of generative AIs and large language models across 14 industry sectors. These guidelines were systematically selected and gathered from official company websites and reliable sources spanning six continents. The dataset, containing 295,692 words, is designed to support various natural language processing tasks, including language modeling, sentiment analysis, semantic analysis, model synthesis, classification, and topic labeling. Additionally, it serves (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  8.  22
    Hebrew offensive language taxonomy and dataset.Marina Litvak, Natalia Vanetik & Chaya Liebeskind - 2023 - Lodz Papers in Pragmatics 19 (2):325-351.
    This paper introduces a streamlined taxonomy for categorizing offensive language in Hebrew, addressing a gap in the literature that has, until now, largely focused on Indo-European languages. Our taxonomy divides offensive language into seven levels (six explicit and one implicit level). We based our work on the simplified offensive language (SOL) taxonomy introduced in (Lewandowska-Tomaszczyk et al. 2021a) hoping that our adjustment of SOL to the Hebrew language will be capable of reflecting the unique linguistic and cultural nuances of Hebrew. (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  9.  16
    Normed dataset for novel metaphors, novel similes, literal and anomalous sentences in Chinese.Xin Wang - 2022 - Frontiers in Psychology 13.
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  10.  7
    Out of dataset, out of algorithm, out of mind: a critical evaluation of AI bias against disabled people.Rohan Manzoor, Wajahat Hussain & Muhammad Latif Anjum - forthcoming - AI and Society:1-11.
    Generative AI models are shaping our future. In this work, we discover and expose the bias against physically challenged people in generative models. Generative models (Stable Diffusion XL and DALL·E 3) are unable to generate content related to the physically challenged, e.g., inclusive washroom, even with very detailed prompts. Our analysis reveals that this disability bias emanates from biased AI datasets. We achieve this using a novel strategy to automatically discover bias against underrepresented groups like the physically challenged. Finally, we (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  11.  44
    A Review of Dynamic Datasets for Facial Expression Research. [REVIEW]Eva G. Krumhuber, Lina Skora, Dennis Küster & Linyun Fou - 2017 - Emotion Review 9 (3):280-292.
    Temporal dynamics have been increasingly recognized as an important component of facial expressions. With the need for appropriate stimuli in research and application, a range of databases of dynamic facial stimuli has been developed. The present article reviews the existing corpora and describes the key dimensions and properties of the available sets. This includes a discussion of conceptual features in terms of thematic issues in dataset construction as well as practical features which are of applied interest to stimulus usage. (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   19 citations  
  12.  23
    Benchmark Pashto Handwritten Character Dataset and Pashto Object Character Recognition (OCR) Using Deep Neural Network with Rule Activation Function.Imran Uddin, Dzati A. Ramli, Abdullah Khan, Javed Iqbal Bangash, Nosheen Fayyaz, Asfandyar Khan & Mahwish Kundi - 2021 - Complexity 2021:1-16.
    In the area of machine learning, different techniques are used to train machines and perform different tasks like computer vision, data analysis, natural language processing, and speech recognition. Computer vision is one of the main branches where machine learning and deep learning techniques are being applied. Optical character recognition is the ability of a machine to recognize the character of a language. Pashto is one of the most ancient and historical languages of the world, spoken in Afghanistan and Pakistan. OCR (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  13.  9
    Japanese tort-case dataset for rationale-supported legal judgment prediction.Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Akira Tokutsu, Keisuke Takeshita & Mihoko Sumida - forthcoming - Artificial Intelligence and Law:1-25.
    This paper presents the first dataset for Japanese Legal Judgment Prediction (LJP), the Japanese Tort-case Dataset (JTD), which features two tasks: tort prediction and its rationale extraction. The rationale extraction task identifies the court’s accepting arguments from alleged arguments by plaintiffs and defendants, which is a novel task in the field. JTD is constructed based on annotated 3477 Japanese Civil Code judgments by 41 legal experts, resulting in 7978 instances with 59,697 of their alleged arguments from the involved (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  14. Gender, age, research experience, leading role and academic productivity of Vietnamese researchers in the social sciences and humanities: exploring a 2008-2017 Scopus dataset.Quan-Hoang Vuong - 2017 - European Science Editing 43 (3):51-55.
    Background: Academic productivity has been studied by scholars all round the world for many years. However, in Vietnam, this topic has scarcely been addressed. This research therefore aims at better understanding the correlations between gender, age, research experience, the leading role of corresponding authors, and the total number of their publications in the specific realm of social sciences and humanities. Methods: The study employed a Scopus dataset with publication profiles of 410 Vietnamese researchers between 2008 and 2017. Results: Men (...)
    Direct download  
     
    Export citation  
     
    Bookmark  
  15.  48
    What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets.Gavin McArdle & Rob Kitchin - 2016 - Big Data and Society 3 (1).
    Big Data has been variously defined in the literature. In the main, definitions suggest that Big Data possess a suite of key traits: volume, velocity and variety, but also exhaustivity, resolution, indexicality, relationality, extensionality and scalability. However, these definitions lack ontological clarity, with the term acting as an amorphous, catch-all label for a wide selection of data. In this paper, we consider the question ‘what makes Big Data, Big Data?’, applying Kitchin’s taxonomy of seven Big Data traits to 26 datasets (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   31 citations  
  16. Classifying offensive language in Arabic: a novel taxonomy and dataset.Chaya Liebeskind, Ali Afawi, Marina Litvak & Natalia Vanetik - 2024 - Lodz Papers in Pragmatics 20 (2):433-462.
    This paper presents a streamlined taxonomy for categorizing offensive language in Arabic, specifically Modern Standard Arabic (MSA) and the Levantine dialect. Addressing a gap in the existing literature, which has mainly focused on Indo-European languages, our taxonomy divides offensive language into seven levels (six explicit and one implicit). We adapted our framework from the simplified offensive language (SOL) taxonomy by (Lewandowska-Tomaszczyk, Barbara, Slavko Žitnik, Anna Bączkowska, Chaya Liebeskind, Jelena Mitrovic & Giedre Valunaite Oleškeviciente. 2021a. Lod-connected offensive language ontology and tagset (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  17.  43
    Understanding and assessing uncertainty of observational datasets for model evaluation using ensembles.Marius Zumwald, Benedikt Knüsel, Christoph Baumberger, Gertrude Hirsch Hadorn, David Bresch & Reto Knutti - 2020 - WIREs Climate Change 10:1-19.
    In climate science, observational gridded climate datasets that are based on in situ measurements serve as evidence for scientific claims and they are used to both calibrate and evaluate models. However, datasets only represent selected aspects of the real world, so when they are used for a specific purpose they can be a source of uncertainty. Here, we present a framework for understanding this uncertainty of observational datasets which distinguishes three general sources of uncertainty: (1) uncertainty that arises during the (...)
    Direct download  
     
    Export citation  
     
    Bookmark  
  18.  36
    On the genealogy of machine learning datasets: A critical history of ImageNet.Hilary Nicole, Andrew Smart, Razvan Amironesei, Alex Hanna & Emily Denton - 2021 - Big Data and Society 8 (2).
    In response to growing concerns of bias, discrimination, and unfairness perpetuated by algorithmic systems, the datasets used to train and evaluate machine learning models have come under increased scrutiny. Many of these examinations have focused on the contents of machine learning datasets, finding glaring underrepresentation of minoritized groups. In contrast, relatively little work has been done to examine the norms, values, and assumptions embedded in these datasets. In this work, we conceptualize machine learning datasets as a type of informational infrastructure, (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   12 citations  
  19.  42
    The BEST Dataset of Language Proficiency.Angela de Bruin, Manuel Carreiras & Jon Andoni Duñabeitia - 2017 - Frontiers in Psychology 8.
    Direct download (7 more)  
     
    Export citation  
     
    Bookmark   6 citations  
  20.  26
    The Challenges of Large‐Scale, Web‐Based Language Datasets: Word Length and Predictability Revisited.Stephan C. Meylan & Thomas L. Griffiths - 2021 - Cognitive Science 45 (6):e12983.
    Language research has come to rely heavily on large‐scale, web‐based datasets. These datasets can present significant methodological challenges, requiring researchers to make a number of decisions about how they are collected, represented, and analyzed. These decisions often concern long‐standing challenges in corpus‐based language research, including determining what counts as a word, deciding which words should be analyzed, and matching sets of words across languages. We illustrate these challenges by revisiting “Word lengths are optimized for efficient communication” (Piantadosi, Tily, & Gibson, (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   6 citations  
  21.  49
    Polish pseudo-words list: dataset of 3023 stimuli with competent judges’ ratings.Kamil K. Imbir, Tomasz Spustek & Jarosław Żygierewicz - 2015 - Frontiers in Psychology 6.
    Direct download (5 more)  
     
    Export citation  
     
    Bookmark   3 citations  
  22.  22
    Challenges as catalysts: how Waymo’s Open Dataset Challenges shape AI development.Sam Hind, Fernando N. van der Vlist & Max Kanderske - forthcoming - AI and Society:1-17.
    Artificial intelligence (AI) and machine learning (ML) are becoming increasingly significant areas of research for scholars in science and technology studies (STS) and media studies. In March 2020, Waymo, Google/Alphabet’s autonomous vehicle project, introduced the ‘Open Dataset Virtual Challenge’, an annual competition leveraging their Waymo Open Dataset. This freely accessible dataset comprises annotated autonomous vehicle data from their own Waymo vehicles. Yearly, Waymo has continued to host iterations of this challenge, inviting teams of computer scientists to tackle (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  23.  30
    Big data and Belmont: On the ethics and research implications of consumer-based datasets.Remy Stewart - 2021 - Big Data and Society 8 (2).
    Consumer-based datasets are the products of data brokerage firms that agglomerate millions of personal records on the adult US population. This big data commodity is purchased by both companies and individual clients for purposes such as marketing, risk prevention, and identity searches. The sheer magnitude and population coverage of available consumer-based datasets and the opacity of the business practices that create these datasets pose emergent ethical challenges within the computational social sciences that have begun to incorporate consumer-based datasets into empirical (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  24.  14
    Objective Bayesian nets for integrating consistent datasets.Jürgen Landes & Jon Williamson - 2022 - Journal of Artificial Intelligence Research 74:393-458.
    This paper addresses a data integration problem: given several mutually consistent datasets each of which measures a subset of the variables of interest, how can one construct a probabilistic model that fits the data and gives reasonable answers to questions which are under-determined by the data? Here we show how to obtain a Bayesian network model which represents the unique probability function that agrees with the probability distributions measured by the datasets and otherwise has maximum entropy. We provide a general (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   2 citations  
  25.  30
    Capturing the Varieties of Natural Language Inference: A Systematic Survey of Existing Datasets and Two Novel Benchmarks.Reto Gubelmann, Ioannis Katis, Christina Niklaus & Siegfried Handschuh - 2023 - Journal of Logic, Language and Information 33 (1):21-48.
    Transformer-based Pre-Trained Language Models currently dominate the field of Natural Language Inference (NLI). We first survey existing NLI datasets, and we systematize them according to the different kinds of logical inferences that are being distinguished. This shows two gaps in the current dataset landscape, which we propose to address with one dataset that has been developed in argumentative writing research as well as a new one building on syllogistic logic. Throughout, we also explore the promises of ChatGPT. Our (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  26.  32
    Objective Bayesian Nets from Consistent Datasets.Jürgen Landes & Jon Williamson - unknown
    This paper addresses the problem of finding a Bayesian net representation of the probability function that agrees with the distributions of multiple consistent datasets and otherwise has maximum entropy. We give a general algorithm which is significantly more efficient than the standard brute-force approach. Furthermore, we show that in a wide range of cases such a Bayesian net can be obtained without solving any optimisation problem.
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark   5 citations  
  27.  25
    Data Cleaners for Pristine Datasets: Visibility and Invisibility of Data Processors in Social Science.Jean-Christophe Plantin - 2019 - Science, Technology, and Human Values 44 (1):52-73.
    This article investigates the work of processors who curate and “clean” the data sets that researchers submit to data archives for archiving and further dissemination. Based on ethnographic fieldwork conducted at the data processing unit of a major US social science data archive, I investigate how these data processors work, under which status, and how they contribute to data sharing. This article presents two main results. First, it contributes to the study of invisible technicians in science by showing that the (...)
    No categories
    Direct download  
     
    Export citation  
     
    Bookmark   1 citation  
  28. Eliciting Welfare Preferences from Behavioral Datasets.Ariel Rubinstein - unknown
    A behavioral dataset contains various preference orderings displayed by the same individual in different payoff-irrelevant circumstances. We introduce a framework for eliciting the individual’s underlying preferences in such cases, in which it is conjectured that the variation in the observed preference orderings is the outcome of some cognitive process that distorts the underlying preferences. We then demonstrate for two cognitive processes how to elicit the individual’s underlying preferences from behavioral datasets.
     
    Export citation  
     
    Bookmark   2 citations  
  29.  15
    Creation and Validation of the Japanese Cute Infant Face (JCIF) Dataset.Hiroshi Nittono, Akane Ohashi & Masashi Komori - 2022 - Frontiers in Psychology 13.
    Research interest in cuteness perception and its effects on subsequent behavior and physiological responses has recently been increasing. The purpose of the present study was to produce a dataset of Japanese infant faces that are free of portrait rights and can be used for cuteness research. A total of 80 original facial images of 6-month-old infants were collected from their parents. The cuteness level of each picture was rated on a 7-point scale by 200 Japanese people. Prototypical high- and (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  30.  42
    MHC‐dependent mate choice in humans: Why genomic patterns from the HapMap European American dataset support the hypothesis.Romain Laurent & Raphaëlle Chaix - 2012 - Bioessays 34 (4):267-271.
    The role of the major histocompatibility complex (MHC) in mate choice in humans is controversial. Nowadays, the availability of genetic variation data at genomic scales allows for a careful assessment of this question. In 2008, Chaix et al. reported evidence for MHC‐dependent mate choice among European American spouses from the HapMap 2 dataset. Recently, Derti et al. suggested that this observation was not robust. Furthermore, when Derti et al. applied similar analyses to the HapMap 3 European American samples, they (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark   4 citations  
  31.  69
    Transferable Feature Representation for Visible-to-Infrared Cross-Dataset Human Action Recognition.Yang Liu, Zhaoyang Lu, Jing Li, Chao Yao & Yanzi Deng - 2018 - Complexity 2018:1-20.
    Recently, infrared human action recognition has attracted increasing attention for it has many advantages over visible light, that is, being robust to illumination change and shadows. However, the infrared action data is limited until now, which degrades the performance of infrared action recognition. Motivated by the idea of transfer learning, an infrared human action recognition framework using auxiliary data from visible light is proposed to solve the problem of limited infrared action data. In the proposed framework, we first construct a (...)
    No categories
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   2 citations  
  32.  6
    LAWSUIT: a LArge expert-Written SUmmarization dataset of ITalian constitutional court verdicts.Luca Ragazzi, Gianluca Moro, Stefano Guidi & Giacomo Frisoni - forthcoming - Artificial Intelligence and Law:1-37.
    Large-scale public datasets are vital for driving the progress of abstractive summarization, especially in law, where documents have highly specialized jargon. However, the available resources are English-centered, limiting research advancements in other languages. This paper introducesLAWSUIT, a collection of 14K Italian legal verdicts with expert-authored abstractive maxims drawn from the Constitutional Court of the Italian Republic.LAWSUITpresents an arduous task with lengthy source texts and evenly distributed salient content. We offer extensive experiments with sequence-to-sequence and segmentation-based approaches, revealing that the latter (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  33.  44
    Avicenna: a challenge dataset for natural language generation toward commonsense syllogistic reasoning.Zeinab Aghahadi & Alireza Talebpour - 2022 - Journal of Applied Non-Classical Logics 32 (1):55-71.
    Syllogism is a type of everyday reasoning. For instance, given that ‘Avicenna wrote the famous book the Canon of Medicine’ and ‘The Canon of Medicine has influenced modern medicine,’ it can be conc...
    Direct download (4 more)  
     
    Export citation  
     
    Bookmark  
  34.  11
    A Digital Capabilities Dataset From Small- and Medium-Sized Enterprises in the Basque Country.Nekane Aramburu, Klaus North, Agustín Zubillaga & María Paz Salmador - 2021 - Frontiers in Psychology 11.
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  35. Hyperstructures, topology and datasets.Nils A. Baas - 2009 - Axiomathes 19 (3):281-295.
    In the natural sciences higher order structures often occur. There seems to be a need for good methods of describing what we mean by higher order structures in various contexts. This is what hyperstructures are intended to do. We motivate and introduce this new concept. Next we illustrate how it can be applied in various types of genomic analysis—particular the correlations between single nucleotide polymorphisms and diseases. The suggested structure is quite general and may be applied to a variety of (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  36.  26
    A COVID-19 Rumor Dataset.Mingxi Cheng, Songli Wang, Xiaofeng Yan, Tianqi Yang, Wenshuo Wang, Zehao Huang, Xiongye Xiao, Shahin Nazarian & Paul Bogdan - 2021 - Frontiers in Psychology 12.
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  37.  13
    On learning context-aware rules to link RDF datasets.Andrea Cimmino & Rafael Corchuelo - 2021 - Logic Journal of the IGPL 29 (2):151-166.
    Integrating RDF datasets has become a relevant problem for both researchers and practitioners. In the literature, there are many genetic proposals that learn rules that allow to link the resources that refer to the same real-world entities, which is paramount to integrating the datasets. Unfortunately, they are context-unaware because they focus on the resources and their attributes but forget about their neighbours. This implies that they fall short in cases in which different resources have similar attributes but refer to different (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  38.  14
    The Attentive Cursor Dataset.Luis A. Leiva & Ioannis Arapakis - 2020 - Frontiers in Human Neuroscience 14.
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  39.  8
    MOWDOC: A Dataset of Documents From Taking the Measure of Work for Building a Latent Semantic Analysis Space.Kim F. Nimon - 2021 - Frontiers in Psychology 11.
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  40.  14
    Assessing School Engagement Intervention Dataset of Nigerian Pre-service TVET Teachers.Godwin Keres Okoro Okereke, Samson Ikenna Nwaodo, Hyginus Osita Omeje, Joshua Onyedikachi Ike, Sylvanus Umunnakwe Njoku, George Nwachukwu Ogbonna, Victor Ikechukwu Oguejiofor, Ifeoma Bernadine Onah, Ogbonnaya Okorie Eze, Pauline Ijeoma Obe, Benedicta Anene Omeje, Ikechukwu Jerry Ogbonna, Nwahunanya Innocent, Veronica Nkechi Imakwu, Ogechukwu Onah, Catherine Chiugo Kanu, John Lliya, Ebiegberi Kontei & Eunice Nwakaego Onah - 2022 - Frontiers in Psychology 13.
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  41. Bigger Isn’t Better: The Ethical and Scientific Vices of Extra-Large Datasets in Language Models.Trystan S. Goetze & Darren Abramson - 2021 - WebSci '21: Proceedings of the 13th Annual ACM Web Science Conference (Companion Volume).
    The use of language models in Web applications and other areas of computing and business have grown significantly over the last five years. One reason for this growth is the improvement in performance of language models on a number of benchmarks — but a side effect of these advances has been the adoption of a “bigger is always better” paradigm when it comes to the size of training, testing, and challenge datasets. Drawing on previous criticisms of this paradigm as applied (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  42.  27
    Citizens’ data afterlives: Practices of dataset inclusion in machine learning for public welfare.Helene Friis Ratner & Nanna Bonde Thylstrup - forthcoming - AI and Society:1-11.
    Public sector adoption of AI techniques in welfare systems recasts historic national data as resource for machine learning. In this paper, we examine how the use of register data for development of predictive models produces new ‘afterlives’ for citizen data. First, we document a Danish research project’s practical efforts to develop an algorithmic decision-support model for social workers to classify children’s risk of maltreatment. Second, we outline the tensions emerging from project members’ negotiations about which datasets to include. Third, we (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  43.  35
    An Approach to Data Reduction for Learning from Big Datasets: Integrating Stacking, Rotation, and Agent Population Learning Techniques.Ireneusz Czarnowski & Piotr Jędrzejowicz - 2018 - Complexity 2018:1-13.
    In the paper, several data reduction techniques for machine learning from big datasets are discussed and evaluated. The discussed approach focuses on combining several techniques including stacking, rotation, and data reduction aimed at improving the performance of the machine classification. Stacking is seen as the technique allowing to take advantage of the multiple classification models. The rotation-based techniques are used to increase the heterogeneity of the stacking ensembles. Data reduction makes it possible to classify instances belonging to big datasets. We (...)
    No categories
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark   1 citation  
  44.  14
    Annotated insights into legal reasoning: A dataset of Article 6 ECHR cases.Jack Mumford, Katie Atkinson & Trevor Bench-Capon - 2024 - Argument and Computation 15 (2):113-119.
    We present a novel annotated dataset of legal cases pertaining to Article 6 – the right to a fair trial – of the European Convention on Human Rights (ECHR). This dataset will serve as a useful resource to the research community, to assist in the training and evaluation of AI systems designed to embody the legal reasoning involved in determining the appropriate legal outcome from a description of the case material. The annotations were applied to provide finer-grain classifications (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  45.  23
    Lessons learned building a legal inference dataset.Sungmi Park & Joshua I. James - 2024 - Artificial Intelligence and Law 32 (4):1011-1044.
    Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  46.  42
    Abstract meaning representation for legal documents: an empirical research on a human-annotated dataset.Sinh Trong Vu, Minh Le Nguyen & Ken Satoh - 2022 - Artificial Intelligence and Law 30 (2):221-243.
    Natural language processing techniques contribute more and more in analyzing legal documents recently, which supports the implementation of laws and rules using computers. Previous approaches in representing a legal sentence often based on logical patterns that illustrate the relations between concepts in the sentence, often consist of multiple words. Those representations cause the lack of semantic information at the word level. In our work, we aim to tackle such shortcomings by representing legal texts in the form of abstract meaning representation, (...)
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  47.  28
    Neural Models for Imputation of Missing Ozone Data in Air-Quality Datasets.Ángel Arroyo, Álvaro Herrero, Verónica Tricio, Emilio Corchado & Michał Woźniak - 2018 - Complexity 2018:1-14.
    Ozone is one of the pollutants with most negative effects on human health and in general on the biosphere. Many data-acquisition networks collect data about ozone values in both urban and background areas. Usually, these data are incomplete or corrupt and the imputation of the missing values is a priority in order to obtain complete datasets, solving the uncertainty and vagueness of existing problems to manage complexity. In the present paper, multiple-regression techniques and Artificial Neural Network models are applied to (...)
    No categories
    Direct download (3 more)  
     
    Export citation  
     
    Bookmark  
  48.  21
    Avoiding the Inherent Limitations in Datasets Used for Measuring Aesthetics When Using a Machine Learning Approach.Adrian Carballal, Carlos Fernandez-Lozano, Nereida Rodriguez-Fernandez, Luz Castro & Antonino Santos - 2019 - Complexity 2019:1-12.
    An important topic in evolutionary art is the development of systems that can mimic the aesthetics decisions made by human begins, e.g., fitness evaluations made by humans using interactive evolution in generative art. This paper focuses on the analysis of several datasets used for aesthetic prediction based on ratings from photography websites and psychological experiments. Since these datasets present problems, we proposed a new dataset that is a subset of DPChallenge.com. Subsequently, three different evaluation methods were considered, one derived (...)
    No categories
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  49.  7
    The East Asian Erotic Picture Dataset and Gender Differences in Response to Opposite-Sex Erotic Stimuli in Chinese College Students.Qianqian Cui, Zixiang Wang, Ziyuan Zhang & Yansong Li - 2021 - Frontiers in Psychology 12.
    Understanding the processing of sexual stimuli has become a significant part of research on human sexuality. In addition to individual characteristics, empirical studies have shown that cultural factors play an important role in sexual stimuli processing. The attitudes toward sex have been reported to be more conservative in East Asian societies as compared to western countries, and significantly more sexual difficulties are observed among East Asian people. However, stimulus materials, which potentially facilitate human sexuality research on native East Asian people, (...)
    Direct download (2 more)  
     
    Export citation  
     
    Bookmark  
  50.  24
    Computational History of Philosophy of Science Dataset.Daniel J. Hicks, Rick Morris & Evelyn Brister - unknown
    The Computational History of Philosophy of Science Dataset aims to be a comprehensive set of article and book chapter metadata for philosophy of science. The dataset covers the full run of over 40 journals and 3 major book series in the field. An automated author disambiguation script is used to construct canonical names for each author, and a combination of gender attribution methods is used to attribute the gender of each author. The full code used to generate the (...)
    Direct download  
     
    Export citation  
     
    Bookmark  
1 — 50 / 987