McKean-Vlasov stochastic differential equations (MV-SDEs) provide a mathematical description of the behavior of an infinite number of interacting particles by imposing a dependence on the particle density.These processes differ from standard It\^o-SDEs to the extent that MV-SDEs include distributional information in their individual particle parameterization.As such, we study the influence of explicitly including distributional information in the parameterization of the SDE.We first propose a series of semi-parametric methods for representing MV-SDEs, and then propose corresponding estimators for inferring parameters from data based on the underlying properties of the MV-SDE.By analyzing the properties of the different architectures and estimators, we consider their relationship to standard It\^o-SDEs and consider their applicability in relevant machine learning problems.We empirically compare the performance of the different architectures on a series of real and synthetic datasets for time series and probabilistic modeling.The results suggest that including the distributional dependence in MV-SDEs is an effective modeling framework for temporal data under an exchangeability assumption while maintaining strong performance for standard It\^o-SDE problems due to the richer class of probability flows associated with MV-SDEs.