I am not a computer science student so I’m trying to follow this…

Question Answered step-by-step I am not a computer science student so I’m trying to follow this… I am not a computer science student so I’m trying to follow this tutorial but I’m struggling.  https://pylangacq.org/index.html Previously I tried to use NLTK CHILDESCorpusReader but there were some issues. https://www.nltk.org/howto/childes.html Here is my google collab file that is sharable as long as you have the link. I’ve tried a few things in different code cells. https://colab.research.google.com/drive/15YOKPMjMW55oAeRz6AxKnotG3P1D8LSl?usp=sharing I’m trying to work with North American English data from the CHILDES database. Here is an index of all the XML files.https://childes.talkbank.org/data-xml/Eng-NA/ The Browsable Transcripts can be accessed here: https://sla.talkbank.org/TBB/childes/Eng-NA I am working with these corpora: Brown corpusAdam (Adam is missing 5 years old)Sarah (has all ages from 3 years old to 6 years old)MacWhinney corpusRoss (has all ages from 3 years old to 6 years old)Garvey corpusAll the transcripts have all ages except 4 years 6 months I’m only interested in files from ages 3 years old to 6 years old at 6 month marks. Therefore I need data from the following ages: 3 years old3 years and 6 months4 years old4 years and 6 months5 years old5 years and 6 months6 years old I need to look for a way to “extract” certain strings that match the parts of speech (POS) codes used in CHILDES. Here is a screenshot and link to the codes, Section 2.4 Part of Speech Codes. I’m interested in pulling “strings” that have determiners, nouns, pronouns and verbs in the child’s speech.Image transcription textEnglish Parts of Speech Category Code Adjective adj Adjective – Predicativeadj:pred Adverb adv Adverb – Temporal adv:tem Communicator COComplementizer comp Conjunction com Coordinator coord Deter… Show more… Show more Image transcription text9 URS: shall we look at these first ?030011 4mor: mod|shall pro:sub|wev|look preplat det : dem|t… Show more… Show more030011 means this file is from when Adam is 3 years old. 030109 is not something that fits the criteria because Adam is 3 years 1 month old here. Here’s the list of strings based on the tags they have that I need. the * (asterisk) is a wilcard so det:* means “any kind of determiner” These tags are all in the %mor line of the data. If necessary, one can check the code against the Browsable Transcripts. Rather than doing this by hand, I believe it would be more efficient to try and create a program that can “pull” these strings. I need children’s utterances so all lines of children speaking will start with “*CHI” Image transcription textpro:int + v [What are] you doing? pro:int+ det:* [What the] dog (is) biting?pro:int + n [What girl] (is) d… Show more… Show more   One way I thought about processing the data was maybe copying the .cha raw data files and pasting them into Notepad and saving them as a .txt file. The task seems pretty simple since the data has been tagged and is all available but since I don’t have a technical background, this is very challenging. Please is the best course of action that is the efficient and makes sense to a non-technical audience. Whether it’s feasible to do all this in i.pynb notebook. I was think of maybe using .startswith() method? so if it .startswith(“pro:int”) then I’ll “extract” that string. txt = “pro:int|how mod|do&3S det:dem|this adj|open ?”x = txt.startswith(“pro:int”)print(x)# I know print(x) will only print True or False. I need it to print the original sentence which in this case is: “how does this open?” Image transcription textC Ahttps://childes.talkbank.o… Show more… Show more  Computer Science Engineering & Technology Python Programming LINGUSTICS 101 Share QuestionEmailCopy link Comments (0)