Natural language provides an intuitive and flexible way for human users to specify tasks to robots. Linear temporal logic (LTL) provides unambiguous semantics for natural language grounding, and its compositional grammar can induce skill transfer. However, existing methods are limited in grounding natural language commands in unseen environments, and existing learning algorithms for temporal tasks only support limited policy reuse for novel specifications. This thesis proposes an integrated robotic system from natural language task specification to robot actions that is generalizable to unseen environments and tasks. We first introduce a modular system, Lang2LTL, that uses pre-trained large language models to ground temporally-extended natural language commands to LTL expressions. Lang2LTL achieves higher accuracy than previous state-of-the-art methods in grounding complex temporal task specifications in unseen environments of city and house scales. We then propose LTL-Transfer, an algorithm that leverages the compositional nature of LTL to reuse learned policies to solve novel tasks in zero-shot. We deployed our integrated system on a physical robot to solve temporal tasks specified by natural language in zero-shot. Our ongoing work is investigating using human-robot dialog to resolve ambiguity in grounding natural language task specification.