Fair Data Pruning Implementation: Datasets, Methods, and Augmentation

tldt arrow

Too Long; Didn't Read

This section details the datasets (CIFAR, TinyImageNet), pruning algorithms (including MetriQ), query model training, score extraction, and data augmentation

People Mentioned

Mention Thumbnail
featured image - Fair Data Pruning Implementation: Datasets, Methods, and Augmentation
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Abstract and 1 Introduction

2 Data Pruning & Fairness

3 Method & Results

4 Theoretical Analysis

5 Discussion, Acknowledgments and Disclosure of Funding, and References

A Implementation Details

B Theoretical Analysis for a Mixture of Gaussians

A Implementation Details

Our empirical work encompasses three standard computer vision benchmarks (Table 1). All code is implemented in PyTorch [Paszke et al., 2017] and run on an internal cluster equipped with NVIDIA RTX8000 GPUs. We make our code available at https://github.com/avysogorets/ fair-data-pruning.


Data Pruning. Data pruning methods require different procedures for training the query model and extracting scores for the training data. For EL2N and GraNd, we use 10% of the full training length reported in Table 1 before calculating the importance scores, which is more than the minimum of 10 epochs recommended by Paul et al. [2021]. To improve the score estimates, we repeat the procedure across 5 random seeds and average the scores before pruning. Forgetting and Dynamic Uncertainty operate during training, so we execute a full optimization cycle of the query model but only do so once. Likewise, CoreSet is applied once on the fully trained embeddings. We use the greedy k-center variant of CoreSet. Since some of the methods require a hold-out validation set (e.g., MetriQ, CDB-W), we reserve 50% of the test set for this purpose. This split is never used when reporting the final model performance.


Data Augmentation. We employ data augmentation only when optimizing the final model. The same augmentation strategies are used for all three datasets. In particular, we normalize examples per-channel and randomly apply shifts by at most 4 pixels in any direction and horizontal flips.


Table 1: Summary of experimental work and hyperparameters. All architectures include batch normalization [Ioffe and Szegedy, 2015] layers followed by ReLU activations. Models are initialized with Kaiming normal [He et al., 2015] and optimized by SGD (momentum 0.9) with a stepwise LR schedule (0.2× drop factor applied on specified Drop Epochs) and categorical cross-entropy. The above hyperparameters are adopted from prior studies [Frankle et al., 2021, Wang et al., 2020].



This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Artem Vysogorets, Center for Data Science, New York University ([email protected]);

(2) Kartik Ahuja, Meta FAIR;

(3) Julia Kempe, New York University, Meta FAIR.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks
OSZAR »