Word Sense Disambiguation (WSD) is an open problem in Natural Language Processing, which is challenging and useful in both supervised and unsupervised settings where all the words in any given text need to be disambiguated without sufficient labeled data. Typically, Most WSD systems use the sentence or a small window of words around the target word as the context for disambiguation, as their computational complexity scales exponentially with the size of the context. In this paper, we propose a semi-supervised neural system, Position-wise Orthogonal Knowledge-Enhanced Disambiguator (PoKED), which allows attention-driven, long-range dependency modeling for word sense disambiguation tasks. The proposed PoKED incorporates position-wise encoding into an orthogonal framework and applies a knowledge-based attentive neural model to solve the WSD problem. Our proposed unsupervised language model is trained over unlabelled corpus, and then the pre-trained language model is capable of abstracting the surrounding context of polyseme instances in labeled corpus into context embeddings. We further utilize the semantic relations in the WordNet, by extracting semantic level inter-word connections from each document-sentence pair in the WSD dataset, and allows us to control the amount of the extraction results by setting a hyperparameter. Our experimental results from standard benchmarks show that our proposed system, PoKED, can achieve competitive performance compared with state-of-the-art knowledge-based WSD systems.