



S1: (appears) to be registered uh, and although, in this class every year between about the first and the third class we lose a few people um, that's, it's it's it actually turns out to be much better, to lose people early than late uh, because, as you probably know, uh o- over at I-S-R, if you wait a while you pay a lot more so um um, basically this is a s- class that requires, a fair amount of prior knowledge and skill and some, commitment and motivation to do work and so, people who don't have all that combination, sometimes drop away and, it's better for them. um, you might notice, and certainly is apparent to us that, during about the first, week maybe ten days of this class the, the d- the differences between you, in your, skills and familiarity with S-P-S-S and, how well you can use UNIX and um, for quite a lot of people finding the right building at the right time is a problem at first, those, those differences seem very diverse at the beginning, but i can guarantee by a week from now w- there won't be a dime's worth of difference between you, in that because we move so fast and so, i just if you're feeling uncertain about things, you'll get certain with them really quite quickly. um, i want to encourage you again to stop and ask whenever, you don't understand, or, need some clarification or whatever of anything i'm saying or, this goes for David and Katy also, uh because, it's very hard to know what you don't get, unless you ask us, so i really encourage you to do that. um, in terms of, uh talking about things that are in the book um, and i will start doing that today, i i w- i'm going to try as much as i can to stick with the sort of, the terminology and the, use of of uh notation, i'm gonna try and do it just almost, it's been useful for me to move closer to what's in the book because it's useful, for all of you who have that to fall back on. so in fact once in a while if i slip or the notation that i've used in, either the handouts or something i write on the board is different than what you've seen in the book, please stop me and, th- we'll make sure because, some of it's a little... it's it's a little, it's it's quite precise. um, since the next time you come to the lecture, uh you will actually be bringing your first assignment, i thought maybe it would s- be worthwhile to spend a little bit of time, talking about, the assignment today because there're some, th- there're some sort of general things about the assignment so if you happen to have your, um, syllabus with you i'm, talking a- i'm looking at the paragraph that's on page eight that describes the assignment. here's what we'd like you to do. um, and, this, shows how much we're expecting, you to learn and do between now, and a few days from now, but i think, by the time you actually, get to writing this up, y- y- th- mostly people don't really have any trouble, once they get over the hump. you need to go into the codebook, for our E-C-L-S data and e- toward the end of today's class i'll talk a little bit more about, the data. um, and you need to select, a simple multi-level question. that can be addressed with these data. so that means you need to select, a dependent variable, uh, and you need to select a few child-level variables, and a few school-level variables. um... i would encourage you not to select a huge number of these uh basically, one of the, it will seem the first time or two, that you do this, creating a sufficient statistics matrix writing out the data etcetera getting ready to, run an H-L-M will see so c- seem so complicated, that, in a sense what you'll be thinking of doing is, well why don't i just do this once? and then i can use this one, what we call, S-S-M sufficient statistics matrix, more than once and certainly that's, a possibility. however i'll just tell you right now you will be making hundreds of these. so, there's no reason to try and pack everything you ever thought of you might wanna ask in this first one so i would suggest, no more than maybe, three or four child-level variables and, three to four, uh school-level variables. and the reason_ i mean in fact to do your first assignment you don't really even need these, all you really need is a dependent variable_ i want you to get used to thinking about, variables at one level variables at another, and a kind of a question that you could be addressing with that, with those so that that's the purpose of this. if you happen to create a sufficient statistics matrix, with variables, and everything's perfect then you can use it thereafter, chances are you'll be wanting to do it again. um, um so, that's the first step. you're a- well there's a lotta steps aren't there? choose a question. next step choose some variables to address that question. and you need variables at two levels the, child level the individual level, and the group level or the school level. so you need, you need variables from both of those s- systems files that are listed in the codebook. and on Friday you will learn how to, d- do this. um, the next thing you will be doing is computing the intraclass correlation for your outcome and we talked about that the other day. that's the proportion of variance, sys- that lies systematically between schools, on this outcome. uh, you'll also get an estimate, of the H-L-M, uh version of reliability on this outcome and we'll talk a little bit more about, how that type of, reliability varies from the kind that you may be familiar with, uh Cronbach's Alpha for the moment just think of them generally as the same they're an indicator of, how good a measure is the outcome variable for being able to do the kind of thing you want. um, so that's the task. now, how o- how will you present the task? well, when you write up your paper, you'll want to lay out your research question, and you may even want a few sentences in fact it would be a good idea to have a few sentences about why you think that's a good question and and the multi-level nature of the question i kinda wanna see your thinking and your writing. in fact i wanna see your thinking and your writing in every paper, not just your computing. alright then you're gonna do your analysis be sure that you include with your paper, um, all of your S-P-S-S output that you need to do this or, if you're using SAS SAS. the command file for writing out the data, the log file for the H-L-M run just so we have a record, of everything you've done. um, then you'll wanna write something, about what you've learned, uh from what you've done about the, the intraclass correlation, and the the degree of promise this has for, uh for future analyses. now, i don't have any problem with your doing this six or seven times till you get a good one, but i also don't have any problem just choosing one doing it, and learning from what you've gotten. i mean the first st- dependent variable you choose, may be one that really doesn't have a lot of promise but you've learned a lot by doing that so, really the purpose of this explan- of of this, first exercise is the, the first step in any H-L-M is this step, but i wanna, make sure, that, y- that you all have some sense of conceptualizing the multi-level question and dealing with the data and writing it up. so, when you do this, um, you'll wanna have a cover page that has all the things that would go on a cover page, uh, i i don't care, quite, how you include the output uh a lot of people, actually, put it into a text file cut and paste and get it into their fi- that's fine, if you wanna just, literally cut and paste and stick it on a piece of paper, that's okay too. anything else David or Katy that i should say about this assignment? i don't wanna get people too hyped up before they even know what they're doing but i also wanna s- i i wanna say that i do have some expectations for the form, of your presentation and your ability to write about what you've done. some people who've come to this class say, oh well wait a minute when were we're taking statistics classes all we wanna do is show you that we know how to do things. not in any class that i ever teach. the writing it up is important. though this isn't really_ the write up here is much less, than we'll expect in future papers, however, i wanna see that you can wrap words around the ideas that you're doing. stating [S2: the only ] the question.
S2: the only thing is, r- remind us if tomorrow we forget to show you how to edit output, so that it will save you pages. so that you [S1: okay ] <SU-F LAUGH> don't use up your printing. we can, we can show you how to do that, using, Microsoft Word. 
S1: yeah. be sh- be sure_ th- and it g- is that [S2: can save you, time and space ] this is a, this is a dever- different level of, skill, than learning how to do H-L-M and writing out raw data and S-P-S-S and that sort of stuff, but it is no less useful. and if the first time you have a little trouble with that, we would be, David and Katy would be happy to show you how to do this again. uh, we, because we've done this before, we know that um, the first week of the class, the office hours of who- whichever T-A has them on Friday, that, they tend to be extremely, well-visited on that Friday. so i think for this week the y- both be there, right? now both, David and Katy have their office down on the third floor so they'll both be there together, but there are a couple of computers in there. and, that's the time for_ if you feel like, by the end of the lab that you're not kinda on top of this, since the assignment is on Monday and, David and Katy are not planning to spend the weekend, at your beck and call, uh, th- that would be a good time to come and visit and if you come in groups that's fine and, th- that then then that that's i- in some sense more efficient. um, if you don't need, if you feel like on t- you're on top of it which many of you will, then you_ don't feel obligated to come visit them they're not, hungry for company. uh, w- w- we're just, we're all hungry for, people getting, the levels_ feeling pretty comfortable with this first level uh uh of H-L-M. uh... [SU-F: (xx) ] le- let me a- let me add something else, um, that's happened before. it's very hard, for any of us to answer, questions about, questions of the type, i made_ i tried this and it didn't work. over email. it's very hard. basically people, kinda need to look at your output to see what might have been the problem. so, i mean, i- feel free to, send email message all the time but also expect that, once in a while, um, you're gonna hafta come and b- bring a piece of paper in your hand to show, because it's hard to know what fouled up. and, things can foul up so don't, by by the time you're at this level, maybe i don't need to say this but i will say it anyway, sometimes, people take... output from the computer that says you have failed this is wrong etcetera, they take it too personally. it doesn't mean that you have failed, <SU-F LAUGH> it means that you and this computer program have not communicated correctly. <LAUGH> so, don't take these things seriously, <LAUGH> uh, as a matter of fact, there are people who do take it very seriously which means they don't really go very far into data analysis because, the f- the fact that you are all where you are now you've probably had hundreds thousands of those error messages, already that's, i mean you don't learn to get it right unless you've gotten it wrong a few times. um, we have visitors today, and you wanna tell us what you're doing? <LAUGH> cuz i've_ i communicated with somebody ov- so many, weeks ago 
S3: (xx) one other question to uh, to you uh because of this assignment. [S1: mhm ] you said like, for the v- variable selection for the [S1: right ] research (portion) i, uh i had a look at the codebook and it's all sorted into like uh, areas like here children teacher demographics principal characteristics, would you think it like a better step for the independent variables to choose uh within one of these domains? or, choose like four across the domains? or 
S1: uh, w- within or across i think that's fine. it really doesn't make a bit of difference. in, fact the_ we sorted this_ i mean i can tell you, that, when you get, the electronic codebook for, E-C-L-S, the variables are not sorted in this way this was something to make it a little bit easier for you, [SU-F: yeah ] when you're conceptualizing. oh something else i'd like to say, is that, um, all these data, from which you are using a subset, are available, on a C-D free of charge, and i have copies of this C-D, for all of you. now i'm happy to give it to you at any time if you would like to have, about five thousand variables in twenty thousand cases right now, or i'll pass them out later on in the semester, there'll be a time, when you might find you get_ some of you are gonna get very wrapped up in the, in the substance here. some of you are never gonna get wrapped up in the substance you're just gonna do what you need to do to learn the H-L-M but if you get wrapped up in the substance, you may find that this subset of variables that we've selected, for this class file, is wholly inadequate. but i can guarantee you, in every little subset in this codebook, there are, many many more variables. and, i- you just, go, to the electronic codebook and, kinda pull 'em off right? that's not the easiest thing in the world but, David definitely <LAUGH> knows how to do that too i mean where_ why_ where do you think this, data file <LAUGH> came from? uh, so, i'm prepared to give one of these to all of you i_ it just seemed easier to ask, uh from the National Center for, Education Statistics, one time for a whole class than to have, twenty people sending independent emails sending me a b- and so, they at first were quite reluctant to send so many but, then when i said well okay that would be fine, you're gonna get a lot of different, emails from people requesting this, they said oh i see the point <LAUGH> uh, so i have 'em in my office right now you you're welcome to 'em, any time, i'll definitely pass them out sometime, t- toward the end but, my guess is, right this minute's not the right time. <LAUGH> Katy? 
S4: i just wanted to say that those categories are arbitrary and, um, we just put them there after, we did the codebook so that it would be easier for you to find particular variables, that y- you're interested in demographics or another thing. so, they're totally arbitrary and, you don't have to, choose (xx) 
S1: and they may even be wrong i mean in other words you may look at a particular variable think that it actually doesn't belong in this [S4: yeah ] category it belongs in another one. uh, and i think, [S4: <LAUGH> (xx) ] it's because the headings are, a little, uh they're they're kind of uh ambiguous as as well. it's always, a problem, when you're creating a codebook to be used in a class, between, trying to balance_ we wanna have enough variables that people can do reasonable analyses that're, pretty interesting to them and to us of course i mean we're interested in these data in these questions, uh, with, let's not overwhelm people with thousands of variables so, we may have uh we have may have have have erred on the side of trying to make it too simple in order to, be able to make it accessible but, uh i'm telling you, there is tons of information available. the the other thing that that's uh, something that you should know about E-C-L-S i was gonna talk about this the end of the semester of the the class but, i'll mention a little bit of it right now, they collected a huge amount of data, from the tea- from each child's teacher each child's kindergarten teacher, and it exists, on the electronic, o- on the on the C-D it exists in a separate teacher-level file. because teachers are matched to students. and because of the design of this particular study, we don't have a separate teacher file here. but if you think for a minute of the structure, of, of, early elementary school in the United States and my guess is it's, probably pretty similar in other countries, you essentially really have children nested in classrooms which are nested in schools. so, really there's three levels of data, not two. and in theory, many of the questions that one would wanna ask, uh, using the Early Childhood Longitudinal Study would in fact take that three-level nesting into account. or might even ignore schools as the nesting and just think of children nested in classrooms. on the other hand, the sampling for these, this study, followed the same general design of other, uh, s- studies, conducted by the, National Center for Education Statistics, which is to sample, a fixed and rather small number of students, children, per school. and that number is, no more than about twenty-five. now we know on average, we know from these data, that, the average elementary school in the United States has three kindergarten classrooms now that doesn't mean that there are very many with three it just means that, there're some large ones with maybe six or seven, or, i mean there's a school down, not very far from here, that only includes uh uh uh pre-kindergarten and kindergarten, uh for the whole city of Ypsilanti so they have a whole lot of kindergarten classrooms. that's really very unusual, we've found from these data it's, quite unusual. the problem is if you have children nested in classrooms and n- classrooms nested in schools and you only have three classrooms on average in each school, that's a pretty small N. so there's a_ it's a b- it's it's a it's a it's a big problem with these data, uh, however they are the newest data from the, from the U-S Department of Education and it seems important, to get these out in front of people who might be interested in analyzing them so, that's why we have not included classroom as a separate anala- as a separate unit of analysis, b- because i have a big grant, to do this this is a this is an issue that has vexed us enormously uh Klaus? 
S5: um, in that larger file are there, variables above the school level?
S1: uhuh. mhm.
S5: cuz i didn't find any (in the) 
S1: well [S5: (there) ] we didn't find any there because we th- we we completely_ we made a decision not to have too many, and also, if you think about the... you th- if you think about what affects small children, in American schools, the school, may not be where the action is but, be my guest. [S5: uh uh ] take the C-D pull as many other variables as you want off, it's also completely possible, to create aggregates, [S5: (okay) ] of child-level characteristics. one thing you do you have is, we sampled schools, we sampled down schools, and then we took every child in those schools, so, it's not like the larger file is gonna have fifty or a hundred s- kids, and here you only have, an average of, seventeen s- children per school, this i- was the sampling. so, any time you wanna do that_ i i would not suggest it right this minute, [S5: (sorry yeah) ] because the, getting into the C-D can really it can can kind of confuse you but if you want to, i got a whole box full. okay any questions about the first assignment before i sort of, plunged on with our, what i what what i plan to do today is essentially finish, the, part of the course that i would call the background or introductory part of the class uh, by talking, about the Bidwell and Kasarda article uh, and then to start actually talking about, the, H-L-M. so i realize, that i'm merging over into chapter two in the book, but, so be it. alright. so i wanna talk about the Bidwell and Kasarda article. this was written in nineteen eighty or published in nineteen eighty i don't even know how much sooner a- b- before that it was actually written, called Conceptualizing and Measuring the Effects of Schools and Schooling. now, as i said the other day, this is one of the articles that i wanted you to read that were, written before there was any multi-level, uh, um software available, and this seems to me to be an unusually good example of why you need this. these are two very eminent sociologists uh, and Charles Bidwell is, i don't know about Kasarda but i know that Charles Bidwell is, is is is still going strong, he's in the sociology department, at the University of Chicago. and he has, he's been a sociologist of education for a long long time, he's been head of the Department of Sociology he's been, head of the Department of Education, he_ nobody's the head of the Department of Education at the University of Chicago now because, they closed it down. but that doesn't mean that there aren't still people there doing sociology of education there are, Charles Bidwell is one of them. uh, they they differentiate well in the title and also in the article, between two different concepts that, may not seem too obviously different. one is called school and the other is called schooling. now schooling, is the process_ i mean they've kinda simplified it in this article to th- to talk about, the process of instruction in schools since, since teaching or instruction's kinda the main work of schools, uh, then, they talk about it as the process of instruction, but another sort of slightly broader way to think about schooling, is what happens to children in school? and not everything that happens to children in school has to do with instruction. and then school itself, uh, is, i- in this concept and i think more generally in the sociology of education which is my field, is, the organization of the place where this instruction occurs. so it is a diff- they are different things. uh, this is a very strongly sociological argument, uh um and you wouldn't see this kind of writing in a in a, paper written by psychologists i don't think. although, more you would than you would've d- twenty years ago. uh as they say, as they said in this article written twenty years and it's still true, much more research has been successful, at finding the effects of schooling, than the effects of schools. well, one, reason is because, at least in their conceptualization, <WRITING ON BOARD NEXT :45> you might have, schools, meaning the organization of them, influencing what goes on in them <:08 PAUSE WHILE WRITING ON BOARD> which in turn influences, some kind of <:16 PAUSE WHILE WRITING ON BOARD> so, clearly, in this conceptulatio- i- i- conceptualization, this is clearly a_ and i i'm kind of pushing, ideas for H-L-M here, this is clearly, a an individual-level concept. learning or any other outcome, for students happens among individuals and there's quite a lot of variability between individuals in the same school. in the same class. whereas, schools themselves that's the_ these are concepts, that are related to, the organization of the place, and then this is somewhere in between.
S6: Maria, is there a way [S1: Heather? ] to model, reciprocal, directional relationships, between those in H-L-M?
S1: no, this isn't a LISREL-type of program. i mean i suppose one could always, turn them around and_ well no you really couldn't. i mean as a matter of fact, actually, we recently did a study using these E-C-L-S data where we were extremely interested, to know, which kinds of kids go to which kinds of schools and, we conceptualized schools, along i think fifteen fourteen to fifteen different dimensions in terms of, school quality. in terms of teacher_ how well educated the teacher is exper- uh experience of the teacher the condition of the neighborhood i mean we wa- we had a long list. alright. that would essentially ass- assume that you've this, these school-level outcomes, and these individual-level predictors. th- i think that's what you're kind of asking. there's no H-L-M that could be done that way and in fact i would be happy to share that paper with you and how we did it, uh, nobody's ever seemed to have objected but, i've done about three papers of this type and it just gives me a lotta trouble each way um, but, i don't know, there isn't, no. it's a good question though. uh as i mentioned the other day and these people certainly support this, the f- research has generally shown, very modest school effects. particularly pre-H-L-M, research. and most of the one_ uh most of the research that has shown school effects, way back before H-L-M, really focused on, um... school composition the kinds of the_ the aggregate characteristics of students who go there. for instance the average S-E-S at the school, or, the racial composition of the school, or uh, the average ability level of st- of children as they come into school now, we're in a particularly good situation to look at, the average ability, of the kids in the school here because we have measures of cognitive performance, as kids enter kindergarten. now at most public schools in the United States kindergarten is the first grade that we offer. and in fact, the structure, of education in the United States this is the first time that every p- child has access to free public education. we do offer some um um, publicly supported preschool programs, but in the United States those publicly supported preschool programs, are reserved for the poor children. and they're not universally available. Head Start's not available to everybody, it's only available to people with quite low income, and the, there are state, supported uh preschool programs as well, but these are restricted to poor kids as well. so this is the first place, that we have universal public education offered to everybody. um, in this article they've called, for more attention to what they call the social organization of schools and, this is a phrase that f- flows off my lips because it's the kind of thing i w- w- would_ w- i think about all the time. to to, kinda clarify that a little bit, this m- might be, a- looking at questions like which children get which services inside the school? who has access to what? uh, and something that might have to do with the distribution of resources in fact, the word distribution is extremely important here. and they call attention to the fact that school attributes are more than just central tendencies meaning aggregate characteristics of individuals, uh they they think it's important, that school variables, in_ when you're doing a school effects analysis are measured close to where, the instruction actually occurs. probably, what's driving Klaus's thinking for he wants more variables. and that's fine i think that's great. uh it's really quite wonderful to recognize that so <LAUGH> early, um. uh Bidwell and Kasarda argue that that this kind of, poor thinking about conceptualizing uh research looking for school effects is both a modeling problem and a measuring problem. the the question is_ and and the idea that i mentioned the other day is, um, and this came up all the time around the, original Coleman Report about, i- thinking that somehow resources that measure something like the number of books in the library are similar to, how many books are children reading? i mean books in the library are useless unless people are reading 'em. uh, they obviously become a resource that people can draw on but the question is, then, who draws on them? nobody really seems to keep track of, which children are reading which books maybe their teacher does. um, these authors and they're not unusual among sociologists of education, are sharply critic- critical of the strain of research called status attainment studies which were very prominent in the sixties and seventies in the f- field of, of uh sociology of education and the Coleman Report is is a, is a great example of the, status attainment model. and, the problems that they list are, well included in the problems that they mention are the following, uh status attainment, research focuses almost entirely on individuals. and assumes that the behavior of individuals, is purely personal, or rational. uh coming from motivation coming from ability, and is somehow not influenced by, the context in which that behavior is actually uh uh f- manifested. uh there's an ignoring in this type of status attainment research, of the social context uh, which is is is seems pretty important. social context is really what they're really interested in. and they point out_ and the social context they call proximate school settings uh these might be the classroom workplaces um and little ability groups and a whole lot of different things. uh i'm trying to mostly push this towards little kids. if you assume that resources are equally available to all children in schools then you're making a big mistake they're not. so in the status attainment view schools would succeed by socializing students to their desired behaviors instead of making sure that access is equitable or even, as a matter of fact there are some people, who think that access really, access to resources shouldn't even be equitable. should actually be stacked in terms of more resources go to children who need them most. a kinda compensatory model. there're plenty of people who think that, but it doesn't really very often happen. as a matter of fact the small amount of money in the U-S, that finances education that comes from the federal government, mostly is organized around this compensatory model. we we put extra money into schools for uh bilingual education or for children with disabilities uh, or we have something called uh Title One which, goes to schools with many low income or poor kids, i- in them. uh, so, the federal government, whatever money they put in schools and in the United States, uh people from other countries are kinda shocked to find that, less than ten percent of the total cost of financing education in the United States and i'm not talking about higher education i'm talking about uh elementary and secondary education actually comes from the federal government. i mean th- there're really no other countries where that's li- like that. uh, so where does the money come from? it comes from states and it comes from localities. so what we have in the United States is quite a lot of, d- differential equality of education because, so much of the money that pays for the schools actually comes from the tax base of where the schools are located. so i could tell you that Ann Arbor, among cities in Michigan, has a very high per pupil expenditure whereas there're some cities, between here and Detroit where the p- expenditures are extremely low. and, they're right next to each other. i mean, th- this is a state of great inequalities. 
S2: Ypsilanti
S1: Ypsilanti, just keep going, Taylor Romulus, uh etcetera. then you go north from Detroit, <SS LAUGH> and and those are the richest school districts in the state. so, there's great inequalities in actual, i mean in in the resources that children can draw on that are specifically school by school or school district by school district uh d- differences. uh, because in the, E-C-L-S, data set except for large cities, there is some, effort made to not draw two schools from the same school district, i mean the the it's it's i- i- th- you can assume that, some of the inequalities that're, in the schools in this i- i- datafile are are come from that. of course, that's only public schools i'm talking about private schools are different. i mean in the United States there's really no public money that goes to private schools at all. okay, they also talk in a section about the they that the the the about conceptualization and modeling and this is what's most relevant to the work we're talking about here is, that, their feeling is that school effects, which w- many, pieces of research have shown are very low, primarily really tap, policies and practices that influence students's(sic) access to resources for schooling, and somehow, most of the studies don't really talk about the access. um <P :06> i wanna work through with you or talk through with you, um, in the handout, this, the page that looks like this. <HOLDS UP HANDOUT> <P :05> so if you could get that in front of you, that would be good. <P :05> these numbers describe what's called a Monte Carlo study. a Monte Carlo study, is essentially a study that is conducted with fake data made-up data. so they have created a data set from which this is, this s- s- study is drawn, uh and it includes, twenty-five hundred students, fifty s- students in each of fifty schools. oh that we had real data sets with that design. fifty students in each school, or what i would like is a hundred students in each school or two hundred students in each school. uh, but the p- Department of Education continues to have, a thousand schools and not very many students in each school because th- th- they keep the price at o- of (that are going to pay) for these studies which is really quite a problem. so, someday when all of you are, in charge of planning data collections please remember that fact. we want lots of people in each school and it will become very clear to you as you start doing your analysis and you start talking about the importance of the within-school sample size or within-group sample size, how important that is. okay so here's this Monte Carlo study, and the outcome variable is some measure of student achievement. when you're f- making up the data you can do what anything you want. so it's some test score. and there're two depe- there're two uh independent variables. X-one is a measure of student-level social class or family background. we'll just call it S-E-S. X-two is a measure of the student's cumulative receipt uh kind of school inputs of uh receipt of school resources. so X-two is some measure of school resources and X-one is a measure of individual family background or family resources. and in this study, they have set the correlation between X-one and the outcome variable and Y and X-two and the outcome variable, they've set it to be equivalent. at the start. and the difference here between column one and column two, is that actually in column one, uh the school resources are allocated to schools randomly, meaning that then the school resources are quite equal across the s- fifty schools, and in column two the, school resources are allocated to schools, based on students' family background. so, from my point of view, column one is so pie-in-the-sky we never have it and column two is the only place where the action is because in fact in, the United States anyway school resources are very_ are quite differential based on family background for all the reasons i just told you. <P :05> okay, now, now i've sorta told what column one and column two are, let's look at the difference between model one model two model three and model four. okay? in model one, all of the the three variables X-one X-two and X-Y are both, measured at the individual level. in model two, uh Y and X are at the individual level and X-two Y_ and X-one, X-two is measured as a school aggregate. so notice in model two, uh X-two has a little bar over the top i mean like, the average. model three, has, X-two at the individual level and Y always is, and X-one is measured as an aggregate. and model four has both X-one and X-two as aggregates, and only Y is at the individual level. so you got these group and individual-level, variables. um, now, as far as i'm concerned, of these one two three four five six seven eight different models on this page, only one of them is actually really tapping what is the realistic situation. and that is model two in the right-hand column. now the point of this remember they've just_ this is th- the this is this is fake data, where they've been able to manipulate whether it's used as an aggregate variable or not an aggregate variable, whether it's the the the the X-two variables are allocated differentially, uh by family background or not so, n- notice_ let's just pu- pay attention to what happens as you move to different models and i'm mostly fi- focusing on the right-hand column partially grouped by X-one cuz i think that's realistic. alright, if you notice in model one, where X-one and X-two are both at the individual level and Y is also, that the correlati- that the the that the, the, it's not really a correlation they use this other re- regression coefficient. these are little regressions that_ the regression coefficient b- between X-one and Y and X-two and Y, are relatively similar, and also X-one and X-two are correlated with each other at about the same level so everything's about point two okay? we're thinking of these as, beta coefficients, okay? but notice what happens, between model one and model two when X-two is an aggregate... the relationship between X-two and Y just about disappears. well, prior to H-L-M this is how every school effects study was done. and, they never found any school effects. but notice what happens when you go down here to model three, which is really something that you very seldom would have where you have aggregate measures of of of students' family background and an individual-level mo- v- uh a a an individual-level variable of school resources, notice then the X-two coefficient on Y gets big again and the X-one goes way down. <P :05> and, if you look at model four which is probably the most unrealistic although people have been known to run analyses like this, but the coefficients for both X, the beta coefficients for both X-one and X-two, on Y are very small, but look what happens to the correlation between X-one and X-two, X-one uh average and X-two average it goes way up. now remember i said the oth- yesterday or whenever it was the last time we had class, that, the relationship between social class and achievement when it was aggregated was like point-seven and point-eight and when it was not aggregated it was like point-two or point-three that's what we're talking about here. all the correlation has gone into the correlation between X-one and X-two. so, basically, the realistic situation here of these eight models, is model two in the right-hand column. because school resources are measured at the school level. and individual-level background is hopefully measured at the individual level, and notice in fact if you use regression which is what they've used here the influence of X-two on Y is, nothing. i don't know_ i mean they didn't really test statistical significance but on fake data it's not really very important here, the size of the coefficient it's gone.
S3: um, just one question
S1: sure
S3: i got that so far there's only one more coefficient this this thing that's always point-nine-nine-something. i didn't quite get what that meant. 
S1: oh oh you mean this this the the round thing over here on the left? is that what you're talking about?
S5: no
S3: uh the uh, (xx) 
S1: oh that oh oh sorry okay i i didn't even, really talk about this. um... when you do an analysis, and you wanna, you want to, estimate how good is my model? then you would be looking at the R-square figure right? which would the b- w- they don't have it here on the paper but we could pretty much figure out what the R-square figure is, because, the square root of one minus R-square is how much error in my analysis is there? and that's what this point-nine-four-two is. it's the square root of one minus R-square. so R-square wasn't too big here. well you_ i mean like it doesn't look like it'd be too big either you only got two predictors, and in the best of_ i mean in the re- there just there's not much_ thi- but that's_ it's not really meaningful and in fact i would probably argue that most of the time this, R-square figure, is not really very important. um i i guess during the course of this class you're gonna hear a lot of my o- o- my other feelings about data analysis. mostly in, doing data analysis we're interested in posing particular questions that're interesting to us, and we're not interested in explaining the total variance in our outcome by throwing everything in but the kitchen sink. so, by looking only at R-square of how good is our model that would be the kind of what i would call the kitchen sink model, and in education data even the kitchen sink model is not gonna do you very well, the_ unless you have, unless your outcome is, test score at time two, which it very well might be, and one of your predictors is test score at time one which it very well might be in these data cuz we got time one and time two, because the correlation between a test of reading, achievement, at time one and a test of reading achievement at time two and in theory we'd like to think well that's nine months later cuz school_ the school year is nine months long, in fact it's much less than nine months because they certainly didn't take these tests on the first day school started, nor did they take them on the last day school was over, so th- the actual difference between the time one and time two measures is more like six to seven months. in fact did we put that on the file that (thing called) test gap is it on here? 
S2: (i don't think so) 
S1: we might wanna do that sometime it's kinda interesting cuz we, we do have a measure they did indicate, in this school for this test for these kids what day did you do the testing? and then they had what day they did the testing and, because you only have this short time period between, that w- to_ that we can measure learning which is, gains on the test between time one and time two, it's actually pretty important, how much time there was in between. little children learn very fast. so, in fact we learned that kind of after we made this class file and we could put it on there sometime (xx,) somebody gets steamed up about it. alright. i hope this has convinced you, well... i don't know whether convinced is the word uh it slightly, has has has led you to maybe believe that in fact, the level at which you measure variables is extremely important number one, um, i mean look what happened to the coefficients of X-one and X-two as we had them aggregated or not aggregated, uh number one, and number two, uh that using regression, to look at a multi-level question because this model two i- uh the one i'm so fond of, is in fact a multi-level question if we're interested in school resources X-two and their influence on achievement after we've taken s- uh children's family background into account i mean that's a quite reasonable multi-level question, using regression, we get the answer to be nothing. so from here on, we're leaving regression behind, and now we're gonna talk about H-L-M for the rest of the semester, so here it comes. w- w- introduction over, now we're ready to move. now the purpose here of this, overview of the logic of Hierarchical Linear Modeling, is to introduce you to, what's it all about? and... as i said i'm trying to use the notation that's, used in the book um and and uh, and i want you to be able to follow this. on the other hand, let me tell you that my own personal predilection is, that it's very important for you to understand these equations, to know what they're saying but, as you get on to, writing about, the work you do, i'm all in favor of moving away from the equations. because using equations and Greek letters has, a lot of negative, aspects to it. number one, anybody who doesn't know what you're talking about is immediately totally completely turned off. so, try writing, something about H-L-M with equations in it to somebody who doesn't know anything about H-L-M and they won't even want to read to page two of your paper. um, uh i i told you the other day that i you know i spent two weeks in Brazil not too long ago talking about H-L-M and, my host who was a member of this class last year um, was quite, distressed with me, because in my talks i was not using any equations you know he had learned these equations and he he, they'd been so useful to him that he wanted everybody to learn them so i said okay then you teach it to these people in the seminar but that's not what i'm about. but today i am about that. today, and the next time we're gonna kinda go through these things and i want you to understand them and if you feel they're very useful, between you and me and the T-As you can use these equations all you want but in the end, i mean you gotta be able to explain these things in real words. so here we go. i'm gonna stick with the example in the book because i want you to be able to understand this one really well, and it's also true, that the example that's in the book is something that's really very near and dear to me and i- i- and you might, i mean you probably don't wanna know but you're gonna find out anyway, that in fact many of these original analyses, i ran them. the data_ the data set Steve Drett and i_ Tress and i put together so like i you know some of these numbers that're in here are very familiar to me, because i made them myself. now, we're now, on to, what page is that? cuz the page number. (in p-) i guess it's page four. 
SU-F: four
S1: alright. now, i am gonna write this again on the board <:07 PAUSE WHILE WRITING ON BOARD> even though you got it right in front of you. <:22 PAUSE WHILE WRITING ON BOARD> okay. now we were just thinking of a little regression, this is_ i'm hoping that is a terminology that you've seen before. we got a little regression inside a single school where we have the achievement, that Y is achievement, for student I, as a function of that student's social class some regression coefficient that relates social class to achievement and an intercept. now what is the intercept?
S3: where the uh, regression line cuts the uh, 
SU-M: one of the axes 
S1: true. 
S3: yeah
S1: cuts what?
SS: (the y-axis)
SU-M: the axis
S3: can s- someone expand on that? the intercept is the value of Y when X is zero. <LAUGH> and the regression coefficient for this is the, the, change in Y for every one unit change in X. isn't that familiar? i hope that's_ tha- if that's new news, you're in the wrong room. and the the error term there's an individual error term for every student I, for how much of this achievement is not explained by social class. so, in this case the error term is very large. and we assume, <:05 PAUSE WHILE WRITING ON BOARD> it that, the error term is normally distributed, with a mean of zero and a variance of sigma squared. okay now, right here i want to introduce, a a concept, that actually, will keep popping up, but it's popping up in its first form here, and it will pop up again. and that's something called centering. <:07 PAUSE WHILE WRITING ON BOARD> now centering is, a simple concept and you don't really ever have to do it, but all we're doing by centering is we will subtract <:09 PAUSE WHILE WRITING ON BOARD> the school mean from the individual, student's social class. when we do that, this, intercept takes on more meaning. it's the average then it's the average achievement for students where social class is zero. because of course if you subtract the school mean from any individual students then that becomes zero if this person has zero as their, i- their social class, so that wipes that out. you don't have to do this but it sure is useful, and the main reason it's useful is because it makes the meaning of the intercept more meaningful. okay so that's for a single school. but even given these two we really only have two variables here we have s- student achievement and student social class. the schools can differ in two ways. they can differ in terms of the intercept and they can differ in terms of the relationship between social class and achievement cuz that's what that is. <POINTING TO BOARD> and if, the intercept is higher, in school one <:05 PAUSE WHILE WRITING ON BOARD> than in school two <:06 PAUSE WHILE WRITING ON BOARD> that means that average achievement is higher in school one and s- than school two and we might say that school one is more effective than school two... but we also can vary the schools in terms of the relationship between social class and achievement, in school one and, in school two. and, somehow, it is desirable, that, the relationship between social class and a- become_ in a perfect world, the relationship between social class and achievement would be zero. children's achievement would not be related to the social class of their family that's the perfect world. so we're thinking that maybe if school one had a lower, relationship between social class and achievement than school two, then in fact we might call school one more equitable. <P :08> and then school one would really be better than school two in two ways. it both would be more effective it would have higher average achievement and it would be more equitable the relationship between social class and achievement would be lower. so everyone might not agree that those are good qualities of schools but, you better keep your <LAUGH> uh lack of agreement to yourselves because i think they're really important. okay. now, let's talk about, a lot of schools because, i mean, when you're talking about multi-level modeling you need a lotta schools. in this case, <P :13 PAUSE WHILE WRITING ON BOARD> and we're gonna fix this little guy, a little bit. we're gonna center him. <:16 PAUSE WHILE WRITING ON BOARD> so now we have, the achievement of student I in school J. we've got a lotta schools so we got, a lot of 'em, J is just what we use. is equal to the, intercept, the average achievement in school J, and the relationship between social class and achievement, this is B not I but B-one, J. because you wouldn't have B I (when) it doesn't_ this relationship wouldn't vary, for each student this is an average measure in each school, and we've just simply centered our, relationship_ i mean our our social class m- measure, in each school. so we've taken each child's, social class, and we have subtracted from that, the average social class in school J. and of course the average social class, could vary quite a lot across schools. <P :05> and then we still have an individual-level error term for, student I in school J. <P :06> and we still have... the error term is normally distributed and... with a mean of zero, and there_ cuz there's an error term for each student, but the but the variance of those error terms varies between schools, and essentially this is just the variance and we'll just say it in another way <:10 WHILE WRITING ON BOARD> now this little guy sigma-squared, is extremely important. i mean everything here is_ this is like, crucial. information. so now we have a lot of schools. and this is still what we call, our, within-school model. <:09 PAUSE WHILE WRITING ON BOARD> now why do i say that? because o- the only two variables we're considering here are, measured at the individual level. <P :06> but notice now we will have, a beta-zero term and a beta-one term for each school. we'll have a different one for each school. remember i said, the other day, that one way to think about this is a lotta little regressions? i mean this looks pretty much like a regression. so if you have a lotta little regressions, you're gonna have J regressions one for each school, and you're gonna have uh, an intercept and a uh, re- regression coefficient between social class and achievement for each school. and we might call <:06 PAUSE WHILE WRITING ON BOARD> this one <:07 PAUSE WHILE WRITING ON BOARD> the effectiveness parameter... and this one we might call the equity parameter. now these terms are only relevant, to this particular situation where we're talking about achievement and social class. 
S8: just, what was your measure of social class, in this?
S1: well uh the standard measure of social class it's the same one that you have on this file uh includes, three common things. i mean i d- i don't want this to, become too much of a sociology class but it's certainly is important in education to know these things. and you could probably do this better than me. um family income, parents' education, and parents' occupation. so they took fam- the parents report a family income, and they took the uh education of each parent, took it into account, but only took one if it was a single parent family and they took the p- the the the educational uh the occupational prestige of the work of each parent if they were two or only one and they created this social class measure. on the file that you have, social class i- the average is zero so it's already, not centered in each school but it's centered on the population the average is zero so zero is a middle class kid, and and, it's a it's what we call a Z-square variable. mean of zero, and a standard deviation of one. i didn't actually check to see if_ we didn't re-Z-score those did we?
S2: uh, (for these) 
S4: not for the sample. no, they were Z-scored with the whole sample,
S1: just took from the whole sample yeah. so they're_ so they might [S4: not for our (xx) sample. ] not be, mean of zero. on the whole population. 
S4: but it's really close.
S1: but it pret- it should be close. but we dropped a few crowded schools. so, anyway, it should be close. alright. let's talk some statistics here for a minute okay? 
S2: (xx) it's not that close, so he might wanna re-Z em.
S1: w- let's not worry about that now but [S2: yeah ] later on (i mean) there're_ i can tell there're some fanatics in the class that will immediately want to do that. very easy. okay, we got a lotta parameters here, alright? we've got, an expected value now i'm giving you a- most of the terminology i mean the main terminology in all of H-L-M is coming right here on this page right now there's not gonna be a whole lot more than this. the expected value for beta-zero, well i mean expected value is a term we use when we're talking statistics, when we wanna talk like everyday folks we might actually say the mean across all the schools. that's just the mean of these beta-zero terms across all the schools, and we call this gamma-zero. gamma is a, the a second-level term, level two term in H-L-M, and beta is a, is a level one term. okay, now, of course, think about this for a minute. we're now treating beta-zero, we never d- you've never done before i don't think. thought of it as a variable in itself. it has a mean and it has a standard deviation or a variance. okay so it has a mean and it has a variance and this th- th- th- this is where we get a little trickier because of course these things are not actually_ these are estimates. so it has a mean and an expected value and it has a variance. and the variance of the, of of of the intercept, we call tau-zero-zero. population variance across all these means. now, think back to what i said the other day. remember the interclass correlation? it's what proportion of the overall variance in the outcome lies systematically between schools well we're certainly gonna need this tau term to be able to do that. okay we also have the expected value, of the relationship between social class and achievement, and we will call this gamma-one. this is the average of the_ well sl- slope is an, easy word here, of the that relationship across all the schools. that parameter, beta-one also has a variance. and we call this tau-one-one. be- i- z- a- a- we'll get to why it's one-one in a minute but it's the population variance of the slopes. actually we don't care that much about tau-one-one we did_ care very much about tau-zero-zero. and it's also true that there is some relationship between the intercept and the slope. and that's just the population (to co-) variance. now typically, the, variance of the intercept, is much larger than the variance of the slope slopes are measured with a lotta error. so okay we've got, now, this is our level-one model where we have individual-level achievement, individual-level social class in this case we've centered it around the school mean, and out of our individual-level models we've got this across, lotsa schools, we have two parameters. beta-zero and beta-one. now we come to the second part of H-L-M, between-school model, that is essentially saying what are the characteristics of schools that influence those two things? i shouldn't've written that on a blackboard that was nailed to the wall. well it's alright you have it in front of you. <:30 CHANGING BLACKBOARD AND WRITING ON BOARD> so we're gonna use these things, as outcomes. <:13 PAUSE WHILE WRITING ON BOARD> and we're gonna model them as a function of something. and that something we're gonna measure, we're gonna call W. <:21 PAUSE WHILE WRITING ON BOARD> and here's what W is. W is a dummy variable. it's zero if the school is public, and it's one if the school is Catholic. <P :10> okay. now we have a lot of things here, that you've probably never seen before. unless you of course read them in the book. gamma-zero-zer- well that's the mean of means. but since we've introduced a level-two variable here, it's the mean of means for, schools cuz these are school-level, for, those schools where W is coded zero. so that's the mean of means, the grand mean... for public schools... and, gamma-one, or zero-one here and it's is i mean t- gamma-zero-one, because it's zero cuz we're looking at the intercept, and it's one because, we've got this variable here, is what the achievement, differential is, for Catholic schools. so it could be higher, than gamma-zero-zero it could be, i mean it it it c- this could be a negative or a positive. in general it's a positive. so if we were interested in the average achievement here for, Catholic schools, once we've taken into account the social class of the kids who went to them cuz that was back in our_ that's back_ was in here, then in fact all we do is add gamma-zero-zero and gamma-zero-one. <P :06> correspondingly, this is the average, relationship between social class and achievement, for schools coded zero for public schools, and this is the, difference f- w- uh in the relationship between social class and achievement for, Catholic schools. 
S3: Maria i think that it's gamma-one-zero. under beta-one-J equals, 
S1: oh yeah yeah yes cuz it's the intercept. it's_ yes y- otherwise the_ we got two of 'em thanks a lot David. now let's just go through that again okay? you can stare it on your paper you can stare at it o- at it, hm. gamma-zero-zero, is the average achievement, across all schools for public schools. now remember, that we've also taken into account the social class of the child. that's how we got these. <P :05> gamma-zero-one is the average difference between Catholic and public schools. and we can just say difference cuz of the way we coded it that's a_ th- that's a beauty of dummy variables. gamma-one-zero, is the average relationship between social class and achievement in public schools. and gamma-one-one is the difference in that relationship between Catholic and public schools. and then we've got these error terms. the error term associated with the, with the intercept and then the er- error term associated with, the, slope. so, we can combine these, and i'll tell you right now in H-L-M you always run these two together. it's two analyses that are run simultaneously as a matter of fact to tell you the truth, you run the whole thing together. we write these equations as though you're doing each one of them separately but in fact you're really not. <P :07> so we can put this, into one big whopping, combined model. <:10 PAUSE WHILE WRITING ON BOARD> and we do this just by substituting i- every term. okay we start with individual-level achievement for each child in each school is <:11 PAUSE WHILE WRITING ON BOARD> and this is our estimate of, for beta-zero, the intercept. then we have another set of terms about the slope. <:42 PAUSE WHILE WRITING ON BOARD> and this is, all the stuff about the slope. and then we have something very ugly. <:20 PAUSE WHILE WRITING ON BOARD> which is our, error term. now conceptually, it's not important that you memorize this at all but it is really important that you understand what we're doing. the error term is extremely ugly and unpleasant and complex, and the errors are not independent of, the predictor variable which we always want them to be, they're not necessarily normally distributed, and they're not constant either within or between schools so the error term is extremely complex... if i've left out (anything.) <P :08> so this is not a standard ordinary least-squares regression because of this complex error term uh and, as a result, we can't do this kind of analysis using regression in any sense, and so this program using maximum likelihood estimation and iterates until the error term is as small as it can be. and essentially you can't really do this with regression. 
SU-M: so you really can't assume, if it's, you really can't assume hetero- hetera- (xx) that_ (xx) 
S1: you can't assume any of the normal things you want to assume about an error term. [SU-M: okay ] period. it's it varies with the variables, it's not normally distributed, you know all that stuff [SU-M: (xx) ] that we like to write, normally distributed with a mean of zero and a standard de- uh uh variance of something, all gone. so you need iterative procedures to be enabled to do this. okay let's generalize a little bit. generally in H-L-M, we talk about level one and level two those are the more general terms, we could talk about level three, but in the course of this course, we won't talk about level three. uh, sometime toward the end of the course during the last week we will have a day when a few people are invited to come in and talk about more complex uses of H-L-M, and there's certainly no reason to do that now, y- i mean but it's good to do it when you get to the point where you're gonna be very solid i guarantee you, in doing this y- by the end_ three weeks from now you are gonna be very very solid_ well three weeks from now you will actually be presenting your final project so i mean think about it it's_ this is speed. uh um, speed learning and hopefully, it's not speed learning and speed forgetting. um, but sometime in that last week we will brin- w- i'll i'll invite, a few people in in fact i've already invited them and they've already agreed to come and, one person will talk about a three-level substantive model, another person will talk about a a um, using H-L-M with a dichotomous outcome, um cuz that's all in the program that you know you you're_ but we're not doing it here. and another person will talk about, using H-L-M in an entirely different way so, n- you're gonna know it al- the- by then. right now, you don't. so we're talking generally about level one and level two, and in our work here we're talking_ it's both in the book and in our d- data set level one is going to be students or individuals and level two is gonna be schools or groups. or, level one we could call them children also. uh what's called random effects level-one error terms are ca- are are with R, and since they're level one they're for each individual in each school they vary uh so it's R-I, I-J and level-two error terms are these Us either zero that would be for the intercept or here U for the slope and they vary across schools so that's where J comes from. level-two variance... i mean i mean variance_ level-one variance, we call sigma-squared, and you get a little sigma-squared estimate, for each school and mostly here i'm talking about the, well it's the variance right. and there's also a level-two variance which we can call tau. now the coefficients that we're interested in or the word you might use instead of coefficients is parameters they're actually things we're interested in using as outcomes. are... in level one_ well they are outcomes from level one they're beta terms. we'd stick with that all the time. and in level two they're gamma terms. so so far we've talked about beta-zero and beta-one, now i think you could probably see, that we could have a beta-one beta-two beta-three beta-four if you put in a lotta predictors right? well remember we've got seventeen kids per school so you don't wanna put in too many. <P :07> so right now we're just concentrating on the intercept and one slope but hopefully y- this can expand, quite a lot. and the level-two coefficients in H-L-M are gamma coefficients. and they're gamma-zero for the intercept and gamma-one's for the slopes or two or three, etcetera. we tend to use the same terminology over and over again where the level-one predictors are Xs, that's hopefully familiar to you, the level-two predictors we'll just call them Ws, and the the dependent variable we'll continue to call Y like we always did. <P :05> now... 
S3: Maria?
S1: yes, Klaus? 
S5: i have a question. it might be repeating what she asked before, 
S1: oh no no i think [S3: but the (xx) ] that's_ i mean repeat is fine here.
S5: is is the the dependent variable always at the lower level? or, could it be at the higher level? or_ it has to be at 
S1: yes. always. yes. yes it is it's always measured on individuals. right. 
S5: and it has to be otherwise the whole machinery wouldn't wouldn't work out (xx) 
S1: otherwise, we're talking about multi-level where you have individuals nested in groups so that's right. so, i mean i- th- uh Heather asked a very good question but it's a question for which_ that it's completely irrelevant to this (program) and so it's nice to be able put a few things outside what we're [S5: right ] gonna learn in four weeks right? that's
S5: but does the program not work if you wanna, do bottom-up instead of top-down analysis. 
S1: no. no.
S5: okay. 
S6: but does the, lowest level al- 
S1: i i'd be happy to talk to you privately [S5: yeah ] about something [S5: sure ] that's worried me for years and how we've actually done it which isn't successful necessarily but it does seem to be, it does seem to get <LAUGH> published in journals so i guess, <SU-M LAUGH> i don't know the field is is is as, h- uh is kinda as mystified as i am.
S6: the lowest level always has to be an individual?
S1: no
S6: okay
SU-M: in that (problem) 
S6: you could have family, or family (xx) 
S1: you could have classrooms nested in schools, families nested in communities definitely. [S6: okay ] but then you don't want any measures on individual families you'd only have_ i mean on individual people in the families you'd have aggregates so that's fine yeah. there's a there's a s- an individual something and it's nested in some higher level grouping e- that's that's that's (great) Luis? 
S7: that was gonna be my question in other words i could look at um, for my case what i'm interested in um say faculty salaries by department and being the department aggregate of the department average salary of all the professors in all the departments, 
S1: uhuh, uhuh, nested in universities. 
S7: nested in universities. okay so it doesn't necessarily have to be in 
S1: d- absolutely. absolutely. and the program will allow you to have individuals nested in departments nested in schools as a three-level model however, uh uh let's not talk about that (now) [S7: yeah ] okay? [S7: okay ] cuz it's really_ it just_ that gets really complex. 
S7: so you can use an aggregate as, as the dependent variable?
S1: mhm, [S7: okay ] mhm. [S7: alright ] but you'd still want it to have some pr- some nice properties like being reliable and also having a reasonably, [S7: distribution ] a reasonable, um yeah you want it to be normally distributed [S7: yeah ] that's right. remember how the_ we have those assumptions about about dependent variables that're kind of a little bit more stringent d- they're still here. but the other thing you'd want it to have, is you want it to have an interclass correlation that is not really tiny. now right from the information on this page seven, there's all the information that you need for the interclass correlation so lemme just write it down, right now. <:21 PAUSE WHILE WRITING ON BOARD> now the only thing that's a little different than what you have in the page here is you have all these terms here is we've only listed here level-one variance do you see where it says variance in level-one variance and the variance of R-I-J is sigma-squared-sub-J? right? so i- you are you with me? well, we don't have any sub-J over here because this is the average of all the sigma-squared across all the different Js. now, by tomorrow at this time, you will have produced some output that will have all these numbers on it. think about what the definition of the interclass correlation is the proportion of variance, of the total variance in your outcome, that lies systematically between groups. okay. well, there's the variance between groups, and here's the total variance, both the within-group variance pooled across schools and the between-group variance i suppose we can, just, continue down. <P :12> Lynne? 
S8: is that gamma-squared or standard deviation? 
SU-M: oh it's (xx) yeah
S1: oh this is not gamma i'm sorry. [SU-M: cuz it looks like gamma ] sorry that's just me sloppy. that's sigma. <:07 PAUSE WHILE WRITING ON BOARD> if you look back on your page... look back on that page here page seven, under the variances cuz that's what we're talking about the interclass correlation is something about variances. the proportion of total variance that lies systematically between groups. okay. here's the variance that lies systematically between groups, and here's the total variance between and within. <P :07> so i think you do have to, actually compute that by hand right? you have to pull sigma-squared off your output and you have to pull tau off your output and you have to pull tau off your output and you have to sit and compute this by hand. s- so a calculator's always useful if you really feel like, you can't do these things by hand. okay now that i've laid that bomb, let's go back to talking about the data for a little bit... as i said the focus here is on kindergarten children most of these children as they enter school for the first time are about five years old and, they're_ we do have age on there somewhere i think don't we? that's the age at which they came into school and it's in months. you can see how old they are. um, in the U-S as i said this is the first year of formal public schooling available for everybody. but it's also true that many of these children did not see school for the first time when they came to kindergarten. it's increasingly common, that children have experienced, some kind of, educational something, before they come to kindergarten. and that educational something_ well if we were if we were talking about this fifteen years ago we would probably ask well did they go to preschool? that would be what we're interested in and probably a lot of you went to preschool i certainly sent my children to preschool every second i could possibly find preschool i was, stuffing them into it. okay? um, and i of course i had to pay for it. cuz i wasn't poor <LAUGH> (you know) preschool you had to pay for. however increasingly, there's not a lot of difference between_ in the U-S anyway between preschool and what we used to call, or sometimes called day care. people, parents need some place to put their children so that mothers can go to work. and, something like, well, probably (we) can figure out what the proportion is but it's well over fifty percent, of, mothers of children of this age are actually working, and it's quite high actually for mothers even of one-year-old children. so children go into some kind of care, some of them, at a very early age. my little grandson who's eleven now, started his day care, at three months old. so he's been, i_ in some ways you could say he's been institutionalized since he was three months old. okay, now, <SU-M LAUGH> since he's b- he was in this day care, i mean i would be getting to look you know i mean i i, he lives in New York so i don't visit him very often but sometimes when i go visit him you've gotta wonder well, well what do they do all day? at a certain point you'd like to think that they're not just having a nap, or a graham cracker and a little Dixie cup of milk which is what i used to see 'em doing all the time, or just playing that somehow, it seemed to me that there should be something that looked like learning your letters learning your numbers etcetera. so those have merged into day care so now, we don't really talk about preschool anymore, we talk about child care. and child care comes in a lotta forms. so you do have a file, uh you do have a variable on this file is what form of child care did this child have the year before he or she came to kindergarten? it's in there somewhere, and it's um, stay home with mom, uh, and they even have different forms of child care, you can have it, with a relative or not at a relative but you can have it in the home or in some kind of center, um, but we don't have any information which just is really really difficult for me and it's not that i selected, that we selected the va- the variables for you, there's not a l- any information about the, what i would call the intellectual or cognitive content of the child care that that that that the child experienced, the year before he or she came to preschool. now i can't even remember whether there's a lot of information about, child care and, and_ about child care before that year before they came. oh, they also could've gone to Head Start. and, the Head Start bureau actually puts money into this data collection so that, the measure of whether or not the child went to Head Start is very very accurate. any time a parent said my child went to Head Start they ask the parent well where was that? what's the a- address and telephone number and they call them up to find out that_ was it really Head Start because people_ Head Start's such a common term that people don't even really know whether you're in it, or not but, this is very accurate. and remember that Head Start is only available to families who are living below the poverty line so, quite poor families. so, there's all that information and you c- there it is for you to do whatever it is you want. uh, in the United States i mean actually over the last thirty years, thirty years ago, it was not that common, no more than about fifty percent of American kids actually, attended kindergarten. i mean kindergarten was seen as optional. now, virtually every five-year-old is in school. so kindergarten is now universal, and many states passed laws that it's universal uh so e- e- essentially this is th- th- this is a file o- th- this file includes a random sample of five-year-olds in the United States there aren't too many, that aren't in kindergarten. um, then you begin to think okay what does kindergarten itself look like? well it's the first year of formal public schooling but there's a tremendous debate, in, our society about what kindergarten should look like. i mean actually there's some debate about what other grades should look like but in kindergarten it's a little starker. and the questions might be to sort of oversimplify what the debate is about, it's should these kids be playing or should they be, should they be, doing school? learning to read, etcetera. and, that debate isn't i- is far from settled. and if you were to visit a kindergarten around here, you better hurry cuz i think school's gonna close before our class closes um, y- i think you would find a combination of both. they're not doing either play or they're not doing purely intellectual activities. um, it's also true that, different schools in different locations have the right to decide how long school is in session. but mostly in American public schools school starts about eight-thirty and ends somewhere between two and three in the afternoon so it's a_ those are full-day, and mostly first grade is full-day. but kindergarten isn't, mostly full-day or mostly half-day this breaks down into about half these kids who are in kindergarten stay there all day and half of them don't. and that that variable i know you have on the file and actually i've done_ we_ our team has done a study of the, the efficacy of full-day versus half-day kindergarten. most of the schools, are either all half-day or all full-day in other words it's a school or a district decision about whether kindergarten is half or full-day. but in about ten percent of the schools, uh on this entire E-C-L-S file i'm not sure quite what it is in this file, uh, kids some are half-day and some are full-day. so that's kind of an odd little, quirk. and in fact this school i was telling you about down in Ypsilanti, Ypsilanti, has an a range an- a an organization where is all children... in preschool or kindergarten who go to public school, all go to this one school. and it just stops at the end of kindergarten so there are lots of kindergarten classes there. and Ypsilanti is a is a place that offers, full-day kindergarten to some children, uh and others don't get it. well it's kind of is- the it's it's sort of organized this way, where, if you're poor, you get a f- you get full-day kindergarten for nothing. if you're not, you gotta pay for it if you want your kid to stay the rest of the day. so there's a very odd arrangement. there's not too many schools like this in this file but i mean since one of 'em's right down the street here and, a very large school with, over ten kindergartens um um... all those variations exist. so, it's definitely true in the United State- there's a real f- difference of opinion about how important kindergarten is. i mean for instance in Ann Arbor in the last few months there's really been a lot of, public controversy about whether kindergarten should be full-day or half-day. i think, Da- Da- David's got a child coming into kindergarten.
S2: due to funding issues half-day. [S1: right ] not due to philosophical issues.
S1: now, that's Ann Arbor thinking about what their priorities are. this is the s- school district one of the highest per-pupil expenditures in the state but they still say they don't have enough money, for full_ half_ for full-day kindergarten, and the reason is and this is not unusual at all, is they don't really think it's that important. if they did they'd have it. so, it's very_ it's been a big controversy across the country full-day and half-day. there're a few states where the whole state_ like Nebraska's one where the whole state is, full-day kindergarten. um, there're some that's school district by school district there's some where it's all half-day. and actually one of the things you have, on your file that you might be interested in is, you h- i hope we have this on the file, uh, uh, what region of the country the school is in. uh uh that_ yeah. that th- th- there's actually considerable regional differences in kindergarten which i had_ i've always ignored region as_ but this is really, sort of important. okay. now d- you're gonna hafta know something about this pretty soon, uh and th- this i'm talking about now is what about missing data? now this is not a course about missing data and don't i wish that everyone had already taken a course about how to deal with missing data but usually, nobody's even thought about it much. so, let's talk about it for a minute. um, well the first thing to say, and this is an important thing, to say cuz you're gonna confront it by the time you do your first analysis is that H-L-M allows no missing data at the school level. now what that means is if a variable that you choose to describe schools is missing on a school, that school and all the kids in it will be completely eliminated from your analysis. so, is that okay? generally H-L-M is the kind of program where you don't wanna lose any data at any time because you need all the data you can get. secondly, if you were to lose schools and, the students in them, i mean what we have here is a nice random sample of U-S schools and kids, and once some of them go away i can guarantee you, that when you lose schools, due to missing data, it doesn't happen randomly. so, what we often do and Katy and David will be showing you this ever so soon, is we d- we plug in values on variables that we care about for schools that don't have data on those. we impute, values for that. and i- i'd prefer, that you don't just impute the grand mean because if it's a Catholic school the grand mean is not really relevant etcetera so sometimes y- you need to get a little bit of information and some time we'll learn to do this. but it really is, generally, not okay to lose schools. so we don't wanna do that. luckily, at the, within-school level, l- level one, you can select either pair-wise or list-wise deletion. so, we prefer pair-wise deletion, because y- then you don't lose everybody uh, that's missing on those data uh, so, that g- generally th- there there's kind of a like a rule here. the rule is don't lose data. do whatever you can to lose it so, i suggest pair-wise deletion of missing data within schools, you can't select pair-wise deletion of missing data between schools so you hafta do something different about that. now the very first time you do this i suppose it doesn't make a lotta difference if you lose a lotta schools, but you better pay attention, from the very first time you do your first run to, your overall sample sizes and what you've got and what your analysis is compared to what you started with. and what you started with are the Ns that are in that codebook. so N equals two hundred for schools and N equals three-three-oh-nine for students so, keep that in mind as you go through and when you lose a lotta schools, pay attention. cuz every time you lose a school you lose every kid in it. um, now there's a th- there's a case occasionally, when you might lose, some schools, and it's probably appropriate to do_ to to to lose them. and i'm i'm kind of leaping ahead a little bit but i i wanna make sure that, you at least begin to think about missing data and what happens here and y- you know that it isn't like, the program is not out to get you here it's really h- how you deal with the data, and the program that you're dealing with. uh if you are interested in, let's say, did i_ is this still (on here?) <ADJUSTS BLACKBOARDS> yeah. let's say instead of our X variable being social class if one thing that's quite lucky on this_ in most files is that, there's not a missing data anywhere on social class. i- there_ everybody's got it. but let's say, that this X variable was, gender. okay? you c- certainly wouldn't be centering it mostly under those circumstances you'd have a dummy variable, and usually we'd code this one as female and zero as male i think that's c- it's clo- coded on the file. alright. now let's say you were interested in, looking at, characteristics of schools that equalized gender differences in math achievement. okay? it seems like a common question and we got the data to look at we got math tests, we've got_ oh you might even actually_ reading is s- something else for there's gender differences in reading for these little kids and girls are really, doing a lot better at, reading, literacy in d- in_ at five years old. that i can tell you. o- let's say you were interested in that, i've certainly been interested in that. alright then you would want to be having, this var- th- this, you'd want to model this as characteristics, a- m- m- as functions of characteristics of schools, cuz let's say you're interested in schools where, literacy or math achievement is more equitable. alright. if you were modeling that, for any school, where there're only boys or only girls you're not gonna i- th- it's gonna drop out of your analysis. because you can't estimate a slope inside a school where you don't have boys and girls. and we_ there are some single sex schools on this file. or if you were interested in race differences. <P :06> if you didn't have_ say you were interested in black white differences okay? if you didn't have a black student and a white student in the school you can't estimate that and those kids are gonna, and schools are gonna drop out of the analysis. well those that's a circumstance where in fact it seems to me reasonable, to have them go. i mean how can you_ i- e- i- there's no reason to be worried about, characteristics of schools that equalize achievement by gender if you've only got one gender you can't really estimate it. so that's a case when, i mean that's not gonna happen to you in your first assignment because you're not really gonna have a_ y- you're not_ i- th- you're just n- it's not gonna happen to you in your first assignment but by your second assignment it is going to happen to you. so i'm just telling you now that there's a_ that_ pay attention to the, the, the actual operating sample size of your schools and kids pay good attention to them. i'm sorry, Rachelle?
S9: how would you filter the schools out if you want some of these schools out?
S1: well there's select-if, you know if_ S-P-S-S has select-if and you filter 'em out any time you want. y- you don't wanna do too much filtering here because, the S-P-S_ i mean because H-L-M is a data-hungry. and so it wants lotsa kids and it wants lotsa schools. so, i mean if you're only interested in public schools for example they've_ we got lotsa s- lots of public schools. if you were only interested in doing an analysis on Catholic schools i think you got a problem cuz we don't really have that many. so, that select-if, i mean, th- th- i i can guarantee you if some of you haven't been using S-P-S-S, or SAS for a while, uh, you're gonna get, you're gonna get limbered up real fast because, any time you wanna create a composite variable any time you wanna do any kinda data manipulation whatsoever, before you read the data into H-L-M, then you're gonna need to get S-P-S-S for that you're also gonna need S-P-S-S for the_ just reading 'em in. okay. i think you're, brain-weary here today.
S10: Maria i, um could you explain, uh the differences between this H-L-M methodology and some others like uh mixed model (analysis?) are they conceptually same or (xx) 
S1: yes they're conceptually the same. uh, the mixed model (unique) in SAS th- that's a bit different that's not really a multi-level program. but there're other multi-level programs M-L-two etcetera is another one, these are conceptually and statistically almost identical and people have done, when early e- early on which wasn't really all that long ago people have done pretty complicated analysis to say if you make the same assumptions across the different programs you get exactly the same answer. now the the mixed model with with uh SAS, is is somewhat different and you don't get the same answer... okay, that's it.
{END OF TRANSCRIPT}

