Room X1.52, Lautrupvang 15,
2750 Ballerup, Denmark
Department of Engineering Technology
Technical University of Denmark
I received my Ph.D. in Computer Science (specializing in Mathematical Optimization) from the Department of Information Technology at Uppsala University in 2019. During the PhD, I interned as a visiting data scientist at The Boston Consulting Group (BCG) Gamma. After the PhD, I worked as a data scientist in Bolt and Wolt (Doordash) in the domain of on-demand logistics optimization.
My current research interests are centered around theories of interpretability and efficiency of machine learning models toward On-Device AI. I explore strategies to streamline complex models without performance loss and unravel the intricate mechanisms of decision-making models. Central to this pursuit is understanding the synergy between model simplification and explainability: Reducing a model's complexity aids in elucidating its functions, and concurrently, explainability drives the efficient compression of the learning model.
L. Cao, L. You, and CSPaper Core Team, "CSPaper Review: Fast, Rubric-Faithful Conference Feedback", International Natural Language Generation Conference (INLG) 2025, accepted. [demo] [discussion]
CSPaper Review (CSPR) is a free, AI-powered tool for rapid, conference-specific peer review in Computer Science (CS). Addressing the bottlenecks of slow, inconsistent, and generic feedback in existing solutions, CSPR leverages Large Language Models (LLMs) agents and tailored workflows to deliver realistic and actionable reviews within one minute. In merely four weeks, it served more than 7,000 unique users from 80 countries and processed over 15,000 reviews, highlighting a strong demand from the CS community. We present our architecture, design choices, benchmarks, user analytics and future road maps.
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
Counterfactual explanations (CE) are the de facto method for providing insights into black-box decision-making models by identifying alternative inputs that lead to different outcomes. However, existing CE approaches, including group and global methods, focus predominantly on specific input modifications, lacking the ability to capture nuanced distributional characteristics that influence model outcomes across the entire input-output spectrum. This paper proposes distributional counterfactual explanation (DCE), shifting focus to the distributional properties of observed and counterfactual data, thus providing broader insights. DCE is particularly beneficial for stakeholders making strategic decisions based on statistical data analysis, as it makes the statistical distribution of the counterfactual resembles the one of the factual when aligning model outputs with a target distribution\textemdash something that the existing CE methods cannot fully achieve. We leverage optimal transport (OT) to formulate a chance-constrained optimization problem, deriving a counterfactual distribution aligned with its factual counterpart, supported by statistical confidence. The efficacy of this approach is demonstrated through experiments, highlighting its potential to provide deeper insights into decision-making models.
This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce SWAP, an Entropic Wasserstein regression (EWR) network pruning formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. The “swap” of a commonly used standard linear regression (LR) with the EWR in optimization is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points, yet incurs marginal extra computational cost. The unique strength of SWAP is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.
(See a full list of publications here)
The paper investigates the weighted sum-rate maximization (WSRM) problem with latent interfering sources outside the known network, whose power allocation policy is hidden from and uncontrollable to optimization. The paper extends the famous alternate optimization algorithm weighted minimum mean square error (WMMSE) under a causal inference framework to tackle with WSRM. Specifically, with the possibility of power policy shifting in the hidden network, computing an iterating direction based only on the observed interference inherently implies that counterfactual is ignored in decision making. A method called synthetic control (SC) is used to estimate the counterfactual. For any link in the known network, SC constructs a convex combination of the interference on other links and uses it as an estimate for the counterfactual. Power iteration in the proposed SC-WMMSE is performed taking into account both the observed interference and its counterfactual. SC-WMMSE requires no more information than the original WMMSE in the optimization stage. To our best knowledge, this is the first paper explores the potential of SC in assisting mathematical optimization in addressing classic wireless optimization problems. Numerical results suggest the superiority of the SC-WMMSE over the original in both convergence and objective.
L. You and D. Yuan. “A note on decoding order in user grouping and power optimization for multi-cell NOMA with load coupling”. IEEE Transactions on Wireless Communications, vol.20 no.1, 2021. [arXiv]
We present a new theoretical result for multi-cell non-orthogonal multiple access (NOMA). For multi-cell scenarios, a so-called load-coupling model has been proposed earlier to characterize the presence of mutua l interference for NOMA, and the optimization process relies on the use of fixed-point iterations [1], [2] across cells. One difficulty here is that the order of decoding for successive interference cancellation (SIC) in NOMA is generally not known a priori. This is because the decoding order in one cell depends on interference, which, in turn, is governed by resource usage in other cells, and vice versa. To achieve convergence, previous works have use d workarounds that pose restrictions to NOMA, such that the SI C decoding order remains throughout the fixed-point iterations. As a comment to [1], [2], we derive and prove the following result: The convergence is guaranteed, even if the order changes ove r the iterations. The result not only waives the need of previous workarounds, but also implies that a wide class of optimization problems for multi-cell NOMA is tractable, as long as that fo r single cell is.
Relevant Paper
L. You, D. Yuan, L. Lei, S. Sun, S. Chatzinotas, and B. Ottersten. “Resource optimization with load coupling in multi-cell NOMA”. IEEE Transactions on Wireless Communications, vol.17, no.7, 2018. [arXiv] [code]
L. You and D. Yuan. “User-centric performance optimization with remote radio head cooperation in C-RAN”, IEEE Transactions on Wireless Communications, vol.19, no.1, 2021. [arXiv]
In a cloud radio access network (C-RAN), distributed remote radio heads (RRHs) are coordinated by baseband units (BBUs) in the cloud. The centralization of signal processing provides flexibility for coordinated multi-point transmission (CoMP) of RRHs to cooperatively serve user equipments (UEs). We target enhancing UEs' capacity performance, by jointly optimizing the selection of RRHs for serving UEs, i.e., resource allocation (and CoMP selection). We analyze the computational complexity of the problem. Next, we prove that under fixed CoMP selection, the optimal resource allocation amounts to solving a so-called iterated function. Towards user-centric network optimization, we propose an algorithm for the joint optimization problem, aiming at maximumly scaling up the capacity for any target UE group of interest. The proposed algorithm enables network-level performance evaluation for quality of experience.
We explore the potential of optimizing resource allocation with flexible numerology in frequency domain and variable frame structure in time domain, in presence of services with different types of requirements. We analyze the computational complexity and propose a scalable optimization algorithm based on searching in both the primal space and dual space that are complementary to each other. Numerical results show significant advantages of adopting flexibility in both time and frequency domains for capacity enhancement and meeting the requirements of mission critical services.