mirror of
https://github.com/Findus23/BachelorsThesis.git
synced 20240827 19:52:12 +02:00
many improvements
This commit is contained in:
parent
ef48a239ca
commit
54ad44946b
9 changed files with 37 additions and 40 deletions

@ 71,7 +71,7 @@ After the simulation the properties of the SPH particles needs to be analyzed. T




To increase the amount of available data and especially reduce the errors caused by the gridbased parameter choices (Table \ref{tab:first_simulation_parameters}), a second simulation run has been started. All source code and initial parameters have been left the same apart from the six main input parameters described above. These are set to a random value in the range listed in Table \ref{tab:resimulationparameters} apart from the initial water fractions. As they seem to have little impact on the outcome (see Section \ref{sec:cov}), they are set to \SI{15}{\percent} to simplify the parameter space.




This way, an additional \num{553} simulations have been calculated on \texttt{Nvidia Tesla P100} graphics cards on \texttt{Google Cloud}. (Of which 100 simulations are only used for comparison in Section \ref{sec:comparison})


This way, an additional \num{553} simulations have been calculated on \texttt{Nvidia Tesla P100} graphics cards on \texttt{Google Cloud}. (Of which 100 simulations are only used for comparison in Chapter \ref{sec:comparison})




\begin{table}[hb]


\centering





@ 13,6 +13,6 @@ way to look at the whole dataset at once is calculating the \textit{Pearson corr


\begin{figure}[h]


\centering


\includegraphics[width=0.6\linewidth]{images/cov.pdf}


\caption{TODO}


\caption{The Pearson correlation coefficient visualized as a bar graph}


\label{fig:cov}


\end{figure}


@ 60,7 +60,7 @@ For doing the actual interpolations, the \texttt{scipy.interpolate.griddata} fun




\subsection{Results}




Figure \ref{fig:griddataresults}\todo{text}


Most notable about the results of the griddata interpolation (see Figure \ref{fig:griddataresults}\todo{text}) are the many fine details that can be seen. This is mostly caused by the fact that this method only uses the closest values for interpolations and therefore there is no smoothing. These details might just be random derivations of the simulation and not a higher resolution of the data. Another thing that can be seen in the bottom right corner of Figure \ref{fig:griddata1} is that griddata can't extrapolate data.




\begin{figure}[h!] % also temporary


\centering




12
42_rbf.tex
12
42_rbf.tex

@ 82,7 +82,7 @@ Solving this linear matrix equation using \texttt{numpy.linalg.solve} gives us t


\end{equation}




Combined we get the following linear combination for the interpolated function $s(x)$ (Figure \ref{fig:rbf1}):


\begin{equation}


\begin{equation}\label{eq:rbf}


s(x)=0.200\phi(\left\x\right\)+


0.798\phi(\left\x3\right\)+


0.085\phi(\left\x5\right\)



@ 95,17 +95,17 @@ Combined we get the following linear combination for the interpolated function $


\begin{subfigure}[t]{0.5\textwidth}


\centering


\includegraphics[width=\linewidth]{images/rbf1.pdf}


\caption{Lorem ipsum}


\caption{The three functions making up the RBF interpolation from Equation \eqref{eq:rbf}}


\label{fig:rbf1}


\end{subfigure}%


~


\begin{subfigure}[t]{0.5\textwidth}


\centering


\includegraphics[width=\linewidth]{images/rbf2.pdf}


\caption{Lorem ipsum, lorem ipsum,Lorem ipsum, lorem ipsum,Lorem ipsum}


\caption{15 points following a sinuslike function with one interpolated value (\textcolor{Green}{\textbullet})}


\label{fig:rbf2}


\end{subfigure}


\caption{Caption place holder}


\caption{Two examples for simple RBF interpolation in one dimension}




\end{figure}





@ 113,11 +113,11 @@ Applying the same method to a list of random points allows to interpolate their




\subsection{Implementation}




The scipy function \texttt{scipy.interpolate.Rbf} allows directly interpolating a value similar to \texttt{griddata} in Section \ref{sec:griddataimplementation}. A difference in usage is that it only allows interpolating a single value, but as it is pretty quick it is possible to calculate multiple values sequentially.


The scipy function \texttt{scipy.interpolate.Rbf} allows directly interpolating a value similar to \texttt{griddata} in Section \ref{sec:griddataimplementation} while using the linear function as the Radial Basis Function ($\phi(r)=r$). A difference in usage is that it only allows interpolating a single value, but as it is pretty quick it is possible to calculate multiple values sequentially.




\subsection{Results}




Figure \ref{fig:rbfresults} \todo{text}


The results from RBF interpolations can be seen in Figure \ref{fig:rbfresults}. It is far smoother with a gradient from \SIrange{0}{100}{\percent} from the top left to the bottom right corner. Only the lower mass (Figure \ref{fig:rbf1}) has a view outliers. Unlike griddata it is also possible to extrapolate to close values and still get realistic results.




\begin{figure}[h!] % also temporary


\centering




20
43_nn.tex
20
43_nn.tex

@ 67,7 +67,7 @@ model.fit(x, Y, epochs=200, validation_data=(x_test, Y_test))




\subsection{Training}




To find the ideal parameters to use, the simulation data (excluding the data from Section \ref{sec:comparison}) is split into two groups: The complete original set of simulations and \SI{80}{\percent} of the new simulation set is used to train the neural network while the remaining \SI{20}{\percent} are used for validation. This means that after every epoch the loss function is not only calculated for the training data, but also for the separate validation data (Figure \ref{fig:loss_val}). Finally, the model with the lowest loss on the validation data set was chosen (Listing \ref{lst:model}).


To find the ideal parameters to use, the simulation data (excluding the data from Chapter \ref{sec:comparison}) is split into two groups: The complete original set of simulations and \SI{80}{\percent} of the new simulation set is used to train the neural network while the remaining \SI{20}{\percent} are used for validation. This means that after every epoch the loss function is not only calculated for the training data, but also for the separate validation data (Figure \ref{fig:loss_val}). Finally, the model with the lowest loss on the validation data set was chosen (Listing \ref{lst:model}).






\begin{figure}[h] % also temporary



@ 95,9 +95,9 @@ After the training, the resulting model is saved in a small \texttt{HDF5} file w




\subsection{Results}




Figure \ref{fig:nnresults} \todo{text}


The output of the Neural Network (Figure \ref{fig:nnresults}) looks quite \todo{end sentence}




\begin{figure}[h!] % also temporary


\begin{figure}


\centering


\begin{subfigure}[t]{0.5\textwidth}


\centering



@ 115,3 +115,17 @@ Figure \ref{fig:nnresults} \todo{text}


\caption{Interpolation result using the trained neural network}


\label{fig:nnresults}


\end{figure}




% !TeX spellcheck = en_US


\begin{table}


\centering


\begin{tabular}{rcc}


& {mean squared error} & {mean error} \\


griddata (only original data) & 0.014 & 0.070 \\


neural network & 0.010 & 0.069 \\


RBF & 0.008 & 0.057 \\


griddata & 0.005 & 0.046


\end{tabular}


\caption{Prediction accuracy for the different interpolation methods}


\label{tab:comparison}


\end{table}


@ 1,23 +1,14 @@


% !TeX spellcheck = en_US


\chapter{Comparison and Conclusion}


\label{sec:comparison}




To compare the three methods explained above and measure their accuracy an additional set of 100 simulations (with the same properties as the ones listed in Section \ref{sec:resimulation}) was created. These results are neither used to train or select the neural network, nor are in the dataset for griddata and RBF interpolation. Therefore, we can use them to generate predictions for their parameters and compare them with the real fraction of water that remained in those simulations. By taking the mean absolute difference and the mean squared error between the predictions and the real result, the accuracy of the different methods can be estimated (Table \ref{tab:comparison}). As one of these parameter sets is outside the convex hull of the training data and griddata can't extrapolate, this simulation is skipped and only the remaining 99 simulations are considered for the griddata accuracy calculation.


All three methods for interpolation described above give results that follow the rough correlations from Section \ref{sec:cov}. So to compare them more precisely and measure their accuracy an additional set of 100 simulations (with the same properties as the ones listed in Section \ref{sec:resimulation}) was created. These results are neither used to train or select the neural network, nor are in the dataset for griddata and RBF interpolation. Therefore, we can use them to generate predictions for their parameters and compare them with the real fraction of water that remained in those simulations. By taking the mean absolute difference and the mean squared error between the predictions and the real result, the accuracy of the different methods can be estimated (Table \ref{tab:comparison}). As one of these parameter sets is outside the convex hull of the training data and griddata can't extrapolate, this simulation is skipped and only the remaining 99 simulations are considered for the griddata accuracy calculation.




Of the three methods, the trained neural network has the highest mean squared error. This seems to be at least partly caused by the fact that during training the neural network, the data is generalized, causing the final network to output the \enquote{smoothest} interpolations. While this causes the errors to be higher, it might be possible that the fine structured details in the simulation output is just a artifact of the simulation setup and doesn't represent real world collisions.


\todo{better wording}


Of the three methods, the trained neural network has the highest mean squared error. This seems to be at least partly caused by the fact that during training of the neural network, the data is strongly generalized, causing the final network to output the \enquote{smoothest} interpolations. While this causes the errors to be higher, it might be possible that the fine structured details in the simulation output are just an artifact of the simulation setup and doesn't represent real world collisions.




Another important aspect to compare is the interpolation speed. The neural network is able to give the 100 results in about \SI{4}{\milli\second} (after loading the trained model). RBF interpolation is still reasonably fast, taking about \SI{8.5}{\second} (\SI{85}{\milli\second} per interpolation). But as \texttt{griddata} expects a gridbased parameter space, it becomes really slow when adding the resimulation data with random parameters. A single interpolation takes about \SI{35}{\second} totaling to around an hour for all 99 test cases. Using only the original dataset brings the runtime down to around \SI{10}{\second}, but causes the results to be less accurate than all other methods. (first row in Table \ref{tab:comparison})




\begin{table}[h]


\centering


\begin{tabular}{rcc}


& {mean squared error} & {mean error} \\


griddata (only original data) & 0.014 & 0.070 \\


neural network & 0.010 & 0.069 \\


RBF & 0.008 & 0.057 \\


griddata & 0.005 & 0.046


\end{tabular}


\caption{Prediction accuracy for the different interpolation methods}


\label{tab:comparison}


\end{table}


Another important aspect to compare is the interpolation speed. The neural network is able to give the 100 results in about \SI{4}{\milli\second} (after loading the trained model which takes approximately one second). RBF interpolation is still reasonably fast, taking about \SI{8.5}{\second} (\SI{85}{\milli\second} per interpolation). But as \texttt{griddata} expects a gridbased parameter space, it becomes really slow when adding the resimulation data with random parameters. A single interpolation takes about \SI{35}{\second} totaling to about an hour for all 99 test cases. Using only the original dataset brings the run time down to around \SI{10}{\second}, but causes the results to be less accurate than all other methods. (first row in Table \ref{tab:comparison})




Interpolation using Radial Basis Functions all in all seems to be the most reliable method if there is enough input data and this input data is mostly spread randomly across the parameter space. It is easy to implement and quite fast to execute while still giving reasonable results. Neural Networks can also create realistic output, but have lots more configurable parameters that need to be tuned to get usable results. Their main advantage is, that in case the input set was by magnitudes larger, only the training would take longer, while evaluating the trained model wouldn't change.




To sum up, it is possible to estimate the amount of water lost in twobodycollisions with arbitrary collision parameters by simulating the outcome of a large amount of collisions using \texttt{SPH} and then doing linear interpolations to get results for parameters in between the ones from the simulation set. While the amount of remaining water is overestimated in this analysis as thermal effects during the collision are ignored, but the results are better than a perfect merging assumption.







@ 3,10 +3,10 @@




While this thesis focuses on the water retention after the collisions, the same methods can be applied to the fraction of basalt from the core of the two bodies that remains after the collision. Using the same parameters for interpolation and a separately trained model with the same parameters results in similar results as for water retention. When plotted just like before in Figure \ref{fig:mass_results}, one can see that the results are quite similar. The main difference is that on average there is a slightly higher core mass retention, which can be explained by the fact that weaker collisions might be strong enough to throw the outer water layer into space, but keep the core intact. In addition, it seems like the border between high and low core mass retention is smaller.\todo{better phrase}




When applying the same comparison as described in Section \ref{sec:comparison} the interpolations seem to have a lower accuracy, but still RBF interpolation gives the best results considering slow speed of griddata.


When applying the same comparison as described in Chapter \ref{sec:comparison} the interpolations seem to have a lower accuracy, but still RBF interpolation gives the best results considering slow speed of griddata.






\begin{table}[h]


\begin{table}


\centering


\begin{tabular}{rcc}


& {mean squared error} & {mean error} \\



@ 61,6 +61,6 @@ When applying the same comparison as described in Section \ref{sec:comparison} t


\caption{Neural Network with $m_{total}=\num{e24}$}


\label{fig:mass_nn2}


\end{subfigure}


\caption{TODO}


\caption{}


\label{fig:mass_results}


\end{figure}





@ 68,7 +68,6 @@


author = {Berndt Dorninger},


title = {Realistic physical collision model in planet formation},


type = {Bachelor's Thesis},


institution = {Department of Astrophysics, University of Vienna},


date = {2019},


subtitle = {Benchmark of GPUenvironments},


file = {:../papers/Realistic_physical_collision_model_in_planet_formation.pdf:PDF},



@ 194,7 +193,6 @@


journal = {{Bull. Acad. Sci. URSS}},


year = {1934},


language = {French},


volume = {1934},


issue = {6},


pages = {793800},


url = {http://mi.mathnet.ru/eng/izv4937},




6
main.tex
6
main.tex

@ 47,12 +47,6 @@


\input{60_massretention.tex}








\chapter{Other TODOs}




\todo[inline]{All captions should start with an uppercase letter and end with a .}






\printbibliography








Loading…
Reference in a new issue