Controlling robots to do different tasks in unstructured environments is pivotal for real-world robotic applications. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and struggle with generalization. In this talk, I will talk about several successful cases and one ongoing effort using imitation learning and generative models for training generalizable robot manipulation skills. The approaches include using symmetry-constrained models for few-shot learning, encoding semantic priors for zero-shot manipulation, and utilizing video generation for multi-task robot manipulation.