On StackOverflow, to posters with more experience ask their questions in fewer words?
No. There's no visible difference:
Chars of non-code:
Chars of code:
The data comes from the super-handy StackOverflow API, which was retrieved using wget and then parsed using rjson and XML.
First read in and parse the JSON:
so.R 1 library(rjson) 2 library(XML) 3 library(ggplot2) 4 library(plyr) 5 6 read.qs = function(path) { 7 fromJSON(file = path)$questions 8 } 9 10 questions = do.call(c, 11 lapply(c('page-1.json', 'page-2.json', 'page-3.json'), 12 read.qs 13 ) 14 )
Then for each one parse the HTML and look for <pre> and <p> tags:
so.R (cont) 15 Table = ldply(questions, function(q) { 16 body.text = sprintf('<body>%s</body>', q$body) 17 body = htmlParse(body.text) 18 19 description = tot.length.of(body, '//p//text()') 20 code = tot.length.of(body, '//pre//text()') 21 22 rep = q$owner$reputation 23 24 data.frame( 25 rep, description, code 26 ) 27 })
(where tot.length.of is:
so.R (cont) 28 tot.length.of = function(doc, query) { 29 parts = xpathApply(doc, query, xmlValue) 30 text = paste(parts, collapse='') 31 nchar(text) 32 }
)
Then make the plots:
so.R (cont) 33 png('description.png') 34 print(ggplot(data=Table) 35 + geom_point(aes(rep, description)) 36 + scale_x_log10() 37 + scale_y_log10() 38 + xlab('Rep') 39 + ylab('Verbosity') 40 ) 41 dev.off() 42 43 png('code.png') 44 print(ggplot(data=Table) 45 + geom_point(aes(rep, code)) 46 + scale_x_log10() 47 + scale_y_log10() 48 + xlab('Rep') 49 + ylab('Verbosity') 50 ) 51 dev.off()
$ Rscript so.R >/dev/null 2>&1
I'm confused. Where is there a comparison between more and less experienced?
ReplyDeleteThe axes are "verbosity" vs "rep" (whatever that is - repetition maybe?) and the two plots are code vs non code. So where does experience come in??
Oh.. sorry ;( "Rep" is "Reputation", which is a score that grows as you ask and answer questions; so people with higher "Rep" are more experienced.
ReplyDeleteAh. Perfectly clear now thanks.
ReplyDelete