**ERIC Number:**ED545706

**Record Type:**Non-Journal

**Publication Date:**2012

**Pages:**110

**Abstractor:**As Provided

**Reference Count:**N/A

**ISBN:**978-1-2675-5987-6

**ISSN:**N/A

Tractable Analysis for Large Social Networks

Zhang, Bin

ProQuest LLC, Ph.D. Dissertation, Carnegie Mellon University

Social scientists usually are more interested in consumers' dichotomous choice, such as purchase a product or not, adopt a technology or not, etc. However, up to date, there is nearly no model can help us solve the problem of multi-network effects comparison with a dichotomous dependent variable. Furthermore, the study of multi-network effects comparison faces another challenge--analyzing large, even huge, networks. Contemporary data collection and retrieval technologies, together with the pervasiveness of online social networks give us access to networks with up to billions of nodes. However, much of current social network analysis (SNA) software cannot handle large network data. Therefore, one thing we may want to do, instead of studying "the" whole network, is to extract subpopulations from the network and study them in a comparative fashion. We could extract multiple distinct connected subpopulations in the network through random restarts, analyze the multi-network effects of each subpopulation, then use meta-analysis like method to generalize the parameters to the whole population. I want to provide methodological and applied solution to make multi-network effects comparison among same group of actors embedded in large social networks tractable in my thesis. Often, we need networks with a relatively small node size e.g., less then 1000, that are internally well connected, but at the same time do not have many ties to the external network. In large measure. I do not want boundary leakage with the external network contaminating the structure of the extracted networks. In the first chapter I develop a novel technique, T-CLAP algorithm, that can extract subpopulations from large scale networks quickly and with minimal boundary leakage. I propose a new measure. I-E ratio, to evaluate the quality of subpopulation returned. I also compare our method with two popular methods in community detection having similar objectives, modularity spectral optimization and greedy maximization of local modularity, in terms of speed and quality. Experiments shows my method has superior results to the other two methods. In chapter 2, I develop a probit model with multiple network autocorrelation terms, mNAP, to study the competing effects of network on each of the distinct extracted subpopulation using my sampling technique in Chapter 1. I first use Expectation-Maximization (E-M) algorithm, which is similar to maximum likelihood, a traditional method widely adopted, then use hierarchical Bayesian, a Bayesian statistics method, to develop two solutions. Both solutions are one of the first in their kinds. I also study the behaviors of both solutions. for example, how sensitive is the solution with regard to the change of parameters' prior distribution. Preliminary experiments show hierarchical Bayesian is more proper than E-M in this context. My software is also validated by using posterior quantile method (Cook et al., 2006). I also plan to study whether my solutions can return correctly estimated parameters by using real and simulated data. With T-CLAP algorithm and mNAP model developed. I can finally readdress some traditional social network analysis problems using large network data. In the final chapter, I investigate which social network models, cohesion and structural equivalence, has a more influential role in explanation of innovation diffusion. While considerable work has been done on these models, the question of which network model explains diffusion has not been resolved, particularly in large data. context. This chapter examines diffusion of Caller Ring Back Tones in a cellular telephone network. Since these societal scale networks are very large (e.g., our call detail record data set has more than one million customers and one billion calls over a three months period from a large cellular service provider in Asia), the study of diffusion in these settings require the application of T-CLAP to extract connected subpopulations from the network. Using mNAP model, we study the competing effects of cohesion and role equivalence on each of the distinct subpopulation detected. The comparison of the results from the two models shows both cohesion and role equivalence are statistically significant. The size of cohesion effect is consistent across different subpopulations, while that of role equivalence varies with the size of subpopulation. The results have consistent pattern across different sizes of subpopulation. Such results suggest different promotion schemes for different sizes of social groups. (Abstract shortened by UMI.) [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]

Descriptors: Social Networks, Network Analysis, Comparative Analysis, Population Groups, Data Analysis, Measurement Objectives, Measurement Techniques, Database Management Systems, Data Processing, Information Networks, Mathematical Applications, Effect Size, Test Construction, Item Analysis, Robustness (Statistics), Program Validation, Comparative Testing, Evaluation Methods

ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml

**Publication Type:**Dissertations/Theses - Doctoral Dissertations

**Education Level:**N/A

**Audience:**N/A

**Language:**English

**Sponsor:**N/A

**Authoring Institution:**N/A