1
0
Fork 0
mirror of https://github.com/Findus23/se-simulator.git synced 2024-09-16 12:23:45 +02:00
Generating fun Stack Exchange questions using Markov chains https://se-simulator.lw1.at/
Find a file
2024-05-05 01:14:00 +02:00
templates Matomo POST 2018-10-10 17:36:32 +02:00
web try updating dependencies 2021-05-12 16:36:13 +02:00
.gitignore way faster random quiz by preshuffling questions 2018-04-13 22:53:28 +02:00
.gitmodules add server 2018-03-22 22:52:14 +01:00
app.py use new sentry SDK 2019-09-26 10:25:37 +02:00
apply_colors.py make python files executable 2018-05-13 13:32:13 +02:00
basemodel.py change db name 2018-03-28 16:00:11 +02:00
config.sample.py improve config 2018-05-13 13:43:03 +02:00
consume.py make python files executable 2018-05-13 13:32:13 +02:00
create.py make python files executable 2018-05-13 13:32:13 +02:00
extra_data.py fix tex color 2018-04-15 15:28:33 +02:00
LICENSE Create LICENSE 2018-04-21 20:21:00 +02:00
markov.py update dependencies (move to sacremoses) 2018-06-08 21:52:56 +02:00
models.py default one upvote and downvote 2018-04-28 20:36:28 +02:00
parsexml.py replace all ' with " 2018-05-13 13:47:50 +02:00
poetry.lock update dependencies and disable voting 2024-05-04 20:40:09 +02:00
pyproject.toml update dependencies and disable voting 2024-05-04 20:40:09 +02:00
README.md improve README 2018-05-13 13:16:37 +02:00
server.py fix deprecated PIL function 2024-05-05 01:14:00 +02:00
shuffle.py make python files executable 2018-05-13 13:32:13 +02:00
text_generator.py replace all ' with " 2018-05-13 13:47:50 +02:00
todb.py make python files executable 2018-05-13 13:32:13 +02:00
updater.py make python files executable 2018-05-13 13:32:13 +02:00
utils.py replace all ' with " 2018-05-13 13:47:50 +02:00

se-simulator

Generating fun Stack Exchange questions using Markov chains

try it out

Requirements

  • python 3.5+ (only tested with python 3.6)
  • 7z

For Debian and similar distribution install with:

sudo apt-get install p7zip-full

Setup

  • git clone with submodules
git clone https://github.com/Findus23/se-simulator
cd se-simulator
git submodule init
git submodule update
  • pip install -r requirements.txt
  • create a MySQL database called se-simulator
  • rename config.sample.py to config.py and fill in the database details and create a secret_key
  • run create.py, which creates the database and fetches the list of SE sites
  • run apply_colors.py (which should run really quickly)
  • create folders called chains, download and raw (or syminks to somewhere where more disk space is left)
  • [download](https://archive.org/details/stackexchange] .7z files for the sites you want to generate (it's recommend to start with a file <100MB)
    • If the .7z has another name as the site has now, rename it
  • run consume.py
    • It should check the hash, move the file to raw/, unpack it and extract the needed content from the .xml files into new .jsonl files. It also writes the data of the file into the db, so it won't be imported again.
  • now the most important step: run todb.py
    • this will generate the markov chains and save them (or use existing ones on the next run)
    • afterwards 100 questions will be added to the db, with corresponding answers, titles and usernames
  • run shuffle.py
    • I haven't found a performant way to get a random question without asigning every question an integer and saving the maximum to count.txt
  • run server.py
    • this starts the Flask server on http://127.0.0.1:5000/
    • if I didn't miss an important step, the site should be working fine now.

other files

  • app.py: needed for Flask
  • basemodel.py and models.py: peewee ORM
  • extra_data.py: manually collected colors of every site with an custom theme
  • markov.py: extending the great markovify library for my use case
  • parsexml.py: reading in the Stack Exchange dump XML files with no more than 40MB RAM usage.
  • text_generator.py: everything that creates the content and handles the Markov chains
  • updater.py: probably not working anymore, checks for newer dump files
  • utils.py: everything else