20 Assignment #02: Semantic Network Analysis
20.1 Instructions:
This assignment focuses on conducting a semantic network analysis of a selected corpus. Follow the steps below to complete the assignment, and submit your work as instructed by your instructor. Ensure clarity in your analysis and interpretation.
20.2 Steps for the Assignment
Step 1. Select Your Corpus (5 points)
- Corpus Selection:
- Choose a text corpus that interests you. Potential sources include:
- Project Gutenberg - a wide range of public domain books.
- Kaggle Datasets - various datasets that may include textual data.
- Alternatively, you may use any other text-based dataset.
- Choose a text corpus that interests you. Potential sources include:
- Corpus Details:
- Provide a brief description of your selected corpus (1-2 sentences).
- Mention the source and why you selected this corpus (1-2 sentences).
Step 2. Conduct Semantic Network Analysis (15 points)
- Preprocessing (4 points):
- Preprocess the text to prepare it for analysis. Typical steps may include:
- Tokenization (splitting text into words or phrases).
- Stop word removal.
- Lemmatization or stemming.
- Describe the preprocessing steps in your final submission.
- Preprocess the text to prepare it for analysis. Typical steps may include:
- Network Creation (6 points):
- Construct a semantic network in which nodes represent words or phrases, and edges represent co-occurrence or semantic similarity.
- Use network analysis libraries such as NetworkX or igraph in Python.
- Provide a clear, labeled visualization of your semantic network.
- Modularity Analysis with Louvain Algorithm (5 points):
- Apply the Louvain algorithm to identify communities or clusters within the network.
- Interpret the results, explaining any major clusters or modules detected. Describe what these clusters might indicate about the thematic or structural aspects of your corpus.
- Include a visualization that highlights the modularity structure, if possible.
20.3 Submission Guidelines
- Report Format: Submit PDF (or MS-Word) report (2 pages, approximately 500 words), including screenshots of any visualizations.
- Python Code: Attach your annotated Python code as a separate Jupyter Notebook (.ipynb) file.
- Due Date: Submit your assignment by November 14, 2024 via the course portal.
Total: 20 points