NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1084218
Record Type: Journal
Publication Date: 2016-Jan
Pages: 28
Abstractor: As Provided
ISSN: ISSN-0305-0009
Estimating the Latent Number of Types in Growing Corpora with Reduced Cost-Accuracy Trade-Off
Hidaka, Shohei
Journal of Child Language, v43 n1 p107-134 Jan 2016
The number of unique words in children's speech is one of most basic statistics indicating their language development. We may, however, face difficulties when trying to accurately evaluate the number of unique words in a child's growing corpus over time with a limited sample size. This study proposes a novel technique to estimate the latent number of words from a series of words uttered by children. This technique utilizes statistical properties of the number of types as a function of the number of sampled tokens. We tested the practical effectiveness of the proposed method in the empirical data analysis of the cross-sectional and longitudinal samples. The converging empirical evidence indicates that the proposed estimator improves the accuracy of vocabulary size estimation over a set of existing estimators. Utilizing this efficient estimator, we propose a new sampling scheme for vocabulary assessment that has lower cost and higher accuracy compared to existing methods.
Cambridge University Press. 100 Brook Hill Drive, West Nyack, NY 10994-2133. Tel: 800-872-7423; Tel: 845-353-7500; Fax: 845-353-4141; e-mail:; Web site:
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A