In this paper, we propose a people removal method from a single image for privacy and other reasons using a three-stage network of depth estimation, semantic segmentation, and inpainting, as shown in Fig. 1. In this three-stage network, we improve semantic segmentation for detecting people. We focus on a special situation of a person and construct a network. It is known that the accuracy of conventional methods can be improved by using edge information. The accuracy of segmentation can be further improved by increasing the accuracy of the edge map. In addition, edge detection does not work well when the person and the background are of the similar color, because edge detects the brightness change of the image. Therefore, in this paper, an adversarial loss function for edge maps is proposed. In addition, since an image with people is expected to have a depth difference from the background image, we use a trained depth estimation network to include the depth image in the input. In this way, it is possible to construct a network for people removal with a high accuracy both quantitatively and qualitatively.