Ever want to add a dictionary feature to your applications but don’t know where to start? Turn out Internet is a wonderful place and have everything you need to craft your own English dictionary either for your own uses or to intergrate into your products.

Google Play Books - In app dictionary feature

Google Play Books - In app dictionary feature

tl;dr

If you want a simple, quick to setup dictionary with definitions, part of speech and example sentences uses Wordnet. If you want more advance and future-proof dictionary take some times to make Wiktionary works for you.

Commercial Dictionary API

There are a lot of different websites offer a full-fledge dictionary to be used online for free, some have their API setup so developers can intergrate dictionary service into their own systems. Although for API, it’s almost impossible to find any free to use service as it’s costly to maintain server and services. Some did offer a free tier to try their API or to use in non-commercial project.

There may be other dictionary API services which offer better free plan but the quality of their entries is hardly comparable to those above. Moreover what we are looking for is a more flexible dictionary which we can freely use.

Public domain literature

An adaquate dictionary would certainly not appear from thin airs, most open dictionary project usually either an crowdsourcing effort or adapting content from public domain literature. For public domain dictionaries there aren’t lot of variety to choose.

Webster’s Revised Unabridged Dictionary (1913 edition)

Webster’s New International Dictionary of the English Language is a public domain dictionary. The dicitonary is used in many unofficial dictionary project. A digital version of dictionary is disitributed by Project Gutenberg provides the dictionary in plain text. An example entry is shown below.

ADJUSTMENT
Ad*just"ment, n. Etym: [Cf. F. ajustement. See Adjust.]

1. The act of adjusting, or condition of being adjusted; act of
bringing into proper relations; regulation.
Success depends on the nicest and minutest adjustment of the parts
concerned. Paley.

2. (Law)

Defn: Settlement of claims; an equitable arrangement of conflicting
claims, as in set-off, contribution, exoneration, subrogation, and
marshaling. Bispham.

3. The operation of bringing all the parts of an instrument, as a
microscope or telescope, into their proper relative position for use;
the condition of being thus adjusted; as, to get a good adjustment;
to be in or out of adjustment.

Syn.
 -- Suiting; fitting; arrangement; regulation; settlement;
adaptation; disposition.

Link

Moby Project

Moby Project is a public domain lexical resource. It doesn’t contain a dictionary which means no definitions and examples sentences. Moby Project does have many valuable resources such as hyphenation, part of speech, pronunciation, related terms and wordlist of multiple languages.

Link

Dictionary Projects

There are some projects with very flexible licenses you can take advantage of to create a English dictionary for your own uses. Some need a little works before it’s ready to be used with your favorite programming language. From there you can either build your own web service or create a offline dictionary feature for your applications.

Disclaimer: I’m not a laywer. If you plan to use these in your commercial projects, you should consult with your lawers about licenses and copyrights.

Wiktionary

Wiktionary is a community dictionary project, a companion of Wikipedia. Wiktionary is a wiki itself, which means everyone can contribute and make it even better. A regular word entry of Wiktionary includes etymology, pronunciation, definitions, part of speech, example sentences, related terms, antonyms and many more. Some even have audio pronunciation

Wiktionary presents itself as a web page, this make its entries aren’t suitable to be used for programming immediately. Of course we can use web scrapping techniques to transforms it into more approachable structure. There are quite a few open source projects aim to parse Wiktionary content into other format (search “Wiktionary” on Github), although I haven’t tried any them yet.

As Wikitionary get updated constantly by the community and have a quite open liences, if you are looking for a mordern competitive dictionary, Wiktionary would be your best bet.

Link

GCIDE

GCIDE is short for GNU Colloborative International Dictionary of English. It’s a free dictionary derived from the public domain Webster’s Revised Unabridged Dictionary, which I have already mentioned above, supplemented with some new definitions from other sources such as Wordnet. GCIDE presents it’s content in simple ASCII text, to make it more suitable for mordern programming we would need a structured format.

GCIDE XML is the project tried to convert GCIDE content into XML format. A regular GCIDE XML entry contains definitions, part of speech, hyphenation and some other things. Although you need to be awared that convert unstructured data into structured data is not a trivial task and there are information lost in process. There may be some project on Github concern of GCIDE and GCIDE XML, if you are interested in you may want to look it up.

While GCIDE is a nice community project and it have its own uses I would not recommend it if you are looking for a simple solution for a dictionary service as it’s neither simple to deploy nor rich in information.

Links

Wordnet

Wordnet is a lexical database of English. It groups words into sets of symnonyms, provided short definitions and usage examples, its primary use is in text analysis and artifical application. Simply speaking we can see Wordnet as a combination of a dictionary and a thesaurus.

Although there is existed program to browse Wordnet content, it means to be read by machine. Wordnet is ain essential for many natural language processing project. An interface of Wordnet is offer in python Natural Language Processing Toolkits (NLTK), there are implementations for other languages if you prefer.

From Wordnet you can easily extract definitions, part of speech, example sentences. But that’s not all of it, symnonyms and antonyms can also be obtained. One drawback of Wordnet is there isn’t a pronunciation included so if you need pronunciation of words you need to get them from other sources. You may want to check this post as it dives a little deeper into using Wordnet. The code below demostration how easy it is to obtain a simple dictionary from Wordnet.

from nltk.corpus import wordnet as wn

dictionary = {}

for word in wn.words():
  dictionary[word] = []
  synsets = wn.synsets(word)  
  for synset in synsets:
    pos        = synset.pos()
    definition = synset.definition()
    examples   = synset.examples()
    dictionary[word].append({'pos': pos, 
                             'definition': definition, 
                             'examples': examples})

print(dictionary)

Links

Conclusions

Thanks to the efforts of community to bring openness to knowledges and creations, today we can obtain a sophisticate works of experts with little to no restriction. By using these resources you can create a dictionary for your product. If you are interested in similar resources for other languages beside English, Wiktionary and Wordnet is a great place to start as they offered a counterpart for quite a few languages.