mirror of
https://github.com/Findus23/BachelorsThesis.git
synced 20240827 19:52:12 +02:00
improve text
This commit is contained in:
parent
56ae2fc1db
commit
101236d900
6 changed files with 18 additions and 11 deletions

@ 86,4 +86,4 @@ To increase the amount of available data and especially reduce the errors caused


\label{tab:resimulationparameters}


\end{table}




This way, an additional \num{553} simulations have been calculated on \texttt{Nvidia Tesla P100} graphics cards on \texttt{Google Cloud}. (Of which 100 simulations are only used for comparison in Section \ref{comparison})


This way, an additional \num{553} simulations have been calculated on \texttt{Nvidia Tesla P100} graphics cards on \texttt{Google Cloud}. (Of which 100 simulations are only used for comparison in Section \ref{sec:comparison})


@ 5,7 +5,9 @@ For the large set of simulations we can now extract the needed values. The outpu




\section{Correlations}


\label{sec:cov}


One very easy, but sometimes flawed\footnote{\todo[inline]{explain issues with pearson}} way to look at the whole dataset at once is calculating the \textit{Pearson correlation coefficient} between the input parameters and the output water fraction (Figure \ref{fig:cov}). This shows the expected result that a higher collision angle (so a more hitandrun like collision) has a higher water retention and a higher collision speed results in far less water left on the two largest remaining fragments. In addition higher masses seem to result in less water retention. The initial water fractions of the two bodies does seem to have very little influence on the result of the simulations.


One very easy, but sometimes flawed%


\footnote{Pearson correlation coefficient only measures linear correlations. With a value close to zero can still be a nonlinear correlation between the two dimensions. In addition the coefficient gives no information about the steepness of the correlation, only about which fraction of the values conform to it.}


way to look at the whole dataset at once is calculating the \textit{Pearson correlation coefficient} between the input parameters and the output water fraction (Figure \ref{fig:cov}). This shows the expected result that a higher collision angle (so a more hitandrun like collision) has a higher water retention and a higher collision speed results in less water left on the two largest remaining fragments. In addition, higher masses seem to result in less water retention. The initial water fractions of the two bodies does seem to have very little influence on the result of the simulations.




\begin{figure}


\centering





@ 4,7 +4,7 @@


\begin{subfigure}[t]{0.5\textwidth}


\centering


\includegraphics[width=\linewidth]{images/graphviz/general.pdf}


\caption{a simple neural network}


\caption{an example for a neural network}


\label{fig:neuralnetworkgeneral}


\end{subfigure}%


~



@ 67,7 +67,7 @@ model.fit(x, Y, epochs=200, validation_data=(x_test, Y_test))




\subsection{Training}




To find the ideal parameters to use the simulation data (excluding the data from Section \ref{sec:tests}) is split into two groups: The original set of simulations and \SI{80}{\percent} of the new simulation set is used to train the neural network while the remaining \SI{20}{\percent} are used for validation. This means that after every epoch not only the loss function for the training data is calculated, but also for the separate validation data (Figure \ref{fig:loss_val}). Finally the model with the lowest loss on the validation data set was chosen (Listing \ref{lst:model}).


To find the ideal parameters to use the simulation data (excluding the data from Section \ref{sec:comparison}) is split into two groups: The complete original set of simulations and \SI{80}{\percent} of the new simulation set is used to train the neural network while the remaining \SI{20}{\percent} are used for validation. This means that after every epoch the loss function is not only calculated for the training data, but also for the separate validation data (Figure \ref{fig:loss_val}). Finally the model with the lowest loss on the validation data set was chosen (Listing \ref{lst:model}).






\begin{figure}[h] % also temporary



@ 85,7 +85,7 @@ To find the ideal parameters to use the simulation data (excluding the data from


\caption{loss function on the validation data}


\label{fig:val_loss}


\end{subfigure}


\caption{During training the loss function (mean squared error) decreases with every epoch until it converges to a final value}


\caption{During training the loss function (mean squared error) decreases with every epoch until it converges to a final value.}


\label{fig:loss_val}




\end{figure}



@ 94,5 +94,3 @@ After the training the resulting model can be saved in a small \texttt{HDF5} fil






\subsection{Results}




\subsection{Issues}





@ 1,11 +1,13 @@


% !TeX spellcheck = en_US


\section{Comparison}


\label{sec:comparison}




To compare the three methods explained above and measure their accuracy an additional set of 100 simulations (with the same properties as the ones listed in Section \ref{sec:resimulation}). These results are neither used to train or select the neural network, nor are in the dataset for griddata and RBF interpolation. Therefore we can use them to generate predictions for their parameters and compare them with the real fraction of water that remained in those simulations. By taking the mean absolute difference or the mean squared error between the predictions and the real result the accuracy of the different methods can be estimated (Table \ref{tab:comparison}). As one of these parameter sets is outside the convex hull of the training data and griddata can't extrapolate, this simulation is skipped and only the remaining 99 simulations are considered for the griddata accuracy calculation.


To compare the three methods explained above and measure their accuracy an additional set of 100 simulations (with the same properties as the ones listed in Section \ref{sec:resimulation}). These results are neither used to train or select the neural network, nor are in the dataset for griddata and RBF interpolation. Therefore, we can use them to generate predictions for their parameters and compare them with the real fraction of water that remained in those simulations. By taking the mean absolute difference and the mean squared error between the predictions and the real result the accuracy of the different methods can be estimated (Table \ref{tab:comparison}). As one of these parameter sets is outside the convex hull of the training data and griddata can't extrapolate, this simulation is skipped and only the remaining 99 simulations are considered for the griddata accuracy calculation.




Of the three methods, the trained neural network has the highest mean squared error. This seems to be\todo{more interpretations}


Of the three methods, the trained neural network has the highest mean squared error. This seems to be at least partly caused by the fact that during training the neural network the data is generalized causing the final network to output the \enquote{smoothest} interpolations. While this causes the errors to be higher, it might be possible that the fine structured details in the simulation output is just a artifact of the simulation setup and doesn't represent real world collisions.


\todo{better wording}




Another important aspect to compare is the interpolation speed. The neural network is able to give the 100 results in about \SI{4}{\milli\second} (after loading the trained model). RBF interpolation is still reasonably fast taking about \SI{8.5}{\second} (\SI{85}{\milli\second} per interpolation). But as \texttt{griddata} expects a gridbased parameter space, it becomes really slow when adding the resimulation data with random parameters. A single interpolation takes about \SI{35}{\second} totaling to around an hour for all 99 test cases. Using only the original dataset brings the runtime down to around \SI{10}{\second}, but causes the results to be less accurate than all other methods.


Another important aspect to compare is the interpolation speed. The neural network is able to give the 100 results in about \SI{4}{\milli\second} (after loading the trained model). RBF interpolation is still reasonably fast taking about \SI{8.5}{\second} (\SI{85}{\milli\second} per interpolation). But as \texttt{griddata} expects a gridbased parameter space, it becomes really slow when adding the resimulation data with random parameters. A single interpolation takes about \SI{35}{\second} totaling to around an hour for all 99 test cases. Using only the original dataset brings the runtime down to around \SI{10}{\second}, but causes the results to be less accurate than all other methods. (first row in Table \ref{tab:comparison})




\begin{table}


\centering




4
main.tex
4
main.tex

@ 50,7 +50,9 @@ To understand how the water transport works exactly one has to find an estimatio








\chapter{Placeholder}


\chapter{Other TODOs}




\todo[inline]{All captions should start with an uppercase letter and end with a .}






\nocite{*}





@ 76,6 +76,9 @@ american, % language of the document


\colorlet{bluebookmarks}{blue}


\colorlet{blueallcolors}{blue}




% footnotes don't start new every chapter


\counterwithout{footnote}{chapter}




% Bibliography 








Loading…
Reference in a new issue