mirror of
https://github.com/Findus23/se-simulator.git
synced 2026-04-14 04:34:19 +02:00
Generating fun Stack Exchange questions using Markov chains
https://se-simulator.lw1.at/
- Python 34%
- CSS 32.5%
- HTML 16.6%
- SCSS 9.1%
- JavaScript 7.8%
| templates | ||
| web | ||
| .gitignore | ||
| .gitmodules | ||
| app.py | ||
| apply_colors.py | ||
| basemodel.py | ||
| config.sample.py | ||
| consume.py | ||
| count.txt | ||
| create.py | ||
| extra_data.py | ||
| LICENSE | ||
| markov.py | ||
| models.py | ||
| parsexml.py | ||
| pyproject.toml | ||
| README.md | ||
| server.py | ||
| shuffle.py | ||
| text_generator.py | ||
| todb.py | ||
| updater.py | ||
| utils.py | ||
| uv.lock | ||
se-simulator
Generating fun Stack Exchange questions using Markov chains
try it out
Requirements
- python 3.5+ (only tested with python 3.6)
- 7z
For Debian and similar distribution install with:
sudo apt-get install p7zip-full
Setup
- git clone with submodules
git clone https://github.com/Findus23/se-simulator
cd se-simulator
git submodule init
git submodule update
pip install -r requirements.txt- create a MySQL database called
se-simulator - rename
config.sample.pytoconfig.pyand fill in the database details and create asecret_key - run
create.py, which creates the database and fetches the list of SE sites - run
apply_colors.py(which should run really quickly) - create folders called
chains,downloadandraw(or syminks to somewhere where more disk space is left) - [download](https://archive.org/details/stackexchange]
.7zfiles for the sites you want to generate (it's recommend to start with a file <100MB)- If the
.7zhas another name as the site has now, rename it
- If the
- run
consume.py- It should check the hash, move the file to
raw/, unpack it and extract the needed content from the.xmlfiles into new.jsonlfiles. It also writes the data of the file into the db, so it won't be imported again.
- It should check the hash, move the file to
- now the most important step: run
todb.py- this will generate the markov chains and save them (or use existing ones on the next run)
- afterwards 100 questions will be added to the db, with corresponding answers, titles and usernames
- run
shuffle.py- I haven't found a performant way to get a random question without asigning every question an integer and saving the maximum to
count.txt
- I haven't found a performant way to get a random question without asigning every question an integer and saving the maximum to
- run
server.py- this starts the Flask server on
http://127.0.0.1:5000/ - if I didn't miss an important step, the site should be working fine now.
- this starts the Flask server on
other files
app.py: needed for Flaskbasemodel.pyandmodels.py: peewee ORMextra_data.py: manually collected colors of every site with an custom thememarkov.py: extending the great markovify library for my use caseparsexml.py: reading in the Stack Exchange dump XML files with no more than 40MB RAM usage.text_generator.py: everything that creates the content and handles the Markov chainsupdater.py: probably not working anymore, checks for newer dump filesutils.py: everything else