报告信息主题:Truthful Dataset Valuation by Pointwise Mutual Information嘉宾:郑舒冉地点:腾讯会议:956-223-968(或点击阅读原文)时间:2024年04月06日(周六)20:00报告摘要In the age of artificial intelligence (AI), data serves as the lifeblood that fuels innovation and development. A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to potentially small test sets. We propose a new data valuation method that provably guarantees the following: data providers always maximize their expected score by truthfully reporting their observed data. Any manipulation of the data, including but not limited to data duplic
………………………………